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Abstract. We present a space-efficient algorithm to compute the Hilbert class 
polynomial Hr>(X) modulo a positive integer P, based on an explicit form of 
the Chinese Remainder Theorem. Under the Generalized Riemann Hypothesis, 
the algorithm uses 0(|D| 1 / 2+<! logP) space and has an expected running time 
of 0(|D| 1+e ). We describe practical optimizations that allow us to handle 
larger discriminants than other methods, with \D\ as large as 10 13 and h(D) 
up to 10 6 . We apply these results to construct pairing- friendly elliptic curves 
of prime order, using the CM method. 



1. Introduction 

Elliptic curves with a prescribed number of points have many applications, in- 
cluding elliptic curve primality proving [2] and pairing-based cryptography [31]. 
The number of points on an elliptic curve E/F q is of the form N = q + 1 — t, where 
|*| < 2y / q. For an ordinary elliptic curve, we additionally require t ^ mod p, where 
p is the characteristic of F 9 . We may construct such a curve via the CM method. 

To illustrate, let us suppose D < —4 is a quadratic discriminant satisfying 

(1) Aq = t 2 ~v 2 D, 

for some integer v, and let O denote the order of discriminant D. The j-invariant of 
the elliptic curve C/O is an algebraic integer, and its minimal polynomial Hd(X) 
is the Hilbert class polynomial for the discriminant D. This polynomial splits com- 
pletely in Wq, and its roots are the j-invariants of elliptic curves with endomorphism 
ring isomorphic to O. To construct such a curve, we reduce Ho modp, compute 
a root in ¥ q , and define an elliptic curve E/¥ q with this j-invariant. Either E or 
its quadratic twist has N points, and we may easily determine which. For more 
details on constructing elliptic curves with the CM method, see [2, 13, 50]. 

The most difficult step in this process is obtaining Hp, an integer polynomial 
of degree h(D) (the class number) and total size 0(|I?| 1+C ) bits. There are several 
algorithms that, under reasonable heuristic assumptions, can compute Hjy in quasi- 
linear time [5, 12, 22, 27], but its size severely restricts the feasible range of D. The 
bound \D\ < 10 10 is commonly cited as a practical upper limit for the CM method 
[31, 43, 44, 68], and this already assumes the use of alternative class polynomials 
that are smaller (and less general) than Hjj. As noted in [27], space is the limiting 
factor in these computations, not running time. But the CM method only uses 
Hjj mod p, which is typically much smaller than Hjj. 
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We present here an algorithm to compute Hd mod P, for any positive integer P, 
using 0{\D\ 1 /" 1+ ' i log P) space. This includes the case where P is larger than the 
coefficients of Hd (for which we have accurate bounds), hence it may be used to 
determine Hd over Z. Our algorithm is based on the CRT approach [1, 5, 17], 
which computes the coefficients of Hd modulo many "small" primes p and then 
applies the Chinese Remainder Theorem (CRT). As in [1], we use the explicit 
CRT [8, Thm. 3.1] to obtain Hd mod P, and we modify the algorithm in [5] to 
compute Hd mod p more efficiently Implementing the CRT computation as an 
online algorithm reduces the space required. We obtain a probabilistic algorithm 
to compute Hd mod P whose output is always correct (a Las Vegas algorithm). 

Under the Generalized Riemann Hypothesis (GRH), its expected running time 
is 0(|D| 1+£ ). More precisely, we prove the following theorem. 

Theorem 1. Under the GRH, Algorithm 2 computes Hd mod P in expected time 
0(|L>|log 5 iDKlogloglDQ 4 ), using O^D^^^og \D\ + log P) loglog |£>|) space. 

In addition to the new space bound, this improves the best rigorously proven time 
bound for computing Hd under the GRH [5, Thm. 1], by a factor of log 2 \D\. 
Heuristically, the time complexity is OdZ?! 1 / 2 log 3+e \D\). We also describe prac- 
tical improvements that make the algorithm substantially faster than alternative 
methods when \D\ is large, and provide computational results for |D| up to 10 13 
and h(D) up to 10 6 . In our largest examples the total size of Hd is many terabytes, 
but less than 200 megabytes are used to compute Hd modulo a 256-bit prime. 

2. Overview 

Let O be a quadratic order with discriminant D < —4. With the CRT approach, 
we must compute Hd mod p for many primes p. We shall use primes in the set 

(2) Vd = {p > 3 prime : Ap = t 2 - v 2 D for some t, v £ Z >0 }. 

These primes split completely in the ring class field Kq of O, split into principal 
ideals in Q[v A D], and are norms of elements in 0, see [2, Prop. 2.3, Thm. 3.2]. For 
each p e Vd, the positive integers t = t(p) and v = v(p) are uniquely determined. 

We first describe how to compute Hd modp for a prime p € Vd, and then 
explain how to obtain Hd mod P for an arbitrary positive integer P. Let us begin 
by recalling a few pertinent facts from the theory of complex multiplication. 

For any field F, we define the set 

(3) E\\ (F) = {j{E/F):End(E)=0}, 

the j-invariants of elliptic curves defined over F whose endomorphism rings are 
isomorphic to O. There are two possibilities for the isomorphism in (3), but as 
in [5] we make a canonical choice and henceforth identify End(-E) with O. For 
j(E) G FiIIq(F) and an invcrtible ideal a in O, let E[a] denote the group of a- 
torsion points, those points annihilated by every z € a C O = End(-E). We then 
define 

j(E) a =j(E/E[a]). 

The map j(E) j(E) a corresponds to an isogeny with kernel E[a] and degree 
equal to the norm of a. This yields a group action of the ideal group of O on the 
set EUo(Ko), and this action factors through the class group cl(O) = cl(D). 
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For a prime p € Vd, a bijection between Ello(F p ) and EWo(Ko) arises from the 
Deuring lifting theorem, see [49, Thms. 13.12-14]. The following proposition then 
follows from the theory of complex multiplication. 

Proposition 1. For each prime p € Vd'- 

1. Hd(X) splits completely over¥ p . It has h(D) roots, which form Elle>(F p ). 

2. The map j(E) i-> j(E) a defines a free transitive action ofcl(D) on Ello(F p ). 

For further background, we recommend the expositions in [23] and [60], and also 
the material in [49, Ch. 10] and [62, Ch. II]. 

Let p be a prime in Vd ■ Our plan is to compute Hd mod p by determining its 
roots and forming the product of the corresponding linear factors. By Proposition 1, 
we can obtain the roots by enumerating the set Elle>(F p ) via the action of c\(D). 
All that is required is an element of Ello(F p ) to serve as a starting point. Thus we 
seek an elliptic curve E/¥ p with End(-E') = O. Now it may be that very few elliptic 
curves E/¥ p have this endomorphism ring. Our task is made easier if we first look 
for an elliptic curve that at least has the desired Frobenius endomorphism, even if 
its endomorphism ring might not be isomorphic to O. 

For j(E) G Elle>(F p ), the Frobenius endomorphism tt e e End(E) = O corre- 
sponds to an element of O with norm p and trace t. Let us consider the set 

(4) Ell t (F p ) = {j(E/¥ p ) : tr(n E ) = t}, 

the j-invariants of all elliptic curves E/¥ p with trace t. We may regard j € EU t (F p ) 
as identifying a particular elliptic curve E/¥ p satisfying j(E) = j and tr(7rj;) = t, 
since such an E is determined up to isomorphism [23, Prop. 14.19]. We have 
Ello(F p ) C Elh(F p ), and note that Elh.(F p ) = Ell_ t (F p ). 

Recall that elliptic curves E/¥ p and E' /¥ p are isogenous over F p if and only if 
tr(ir E ) = tr(Tr^), see [39, Thm. 13.8.4]. Given j(E) e Elb(F p ), we can efficiently 
obtain an isogenous j(E') € Ello(F p ), provided v(p) has no large prime factors. 

This yields Algorithm 1. Its structure matches [5, Alg. 2], but we significantly 
modify the implementation of Steps 1,2, and 3. 

Algorithm 1. Given p e Vd, compute Hd mod p as follows: 

1. Search for a curve E with j(E) <= Ell t (F p ) (Algorithm 1.1). 

2. Find an isogenous E' with j(E') e E11 (F P ) (Algorithm 1.2). 

3. Enumerate Ell (F p ) from j(E') via the action of cl(D) (Algorithm 1.3). 

4. Compute H D mod p as H D (X) = n je Eii (F p )(^ : _ J)- 

Algorithm 1.1 searches for j(E) e Ell t (F p ) by sampling random curves and 
testing whether they have trace t (or —t). To accelerate this process, we sample a 
family of curves whose orders are divisible by to, for some suitable m\(p + 1 ± t). 
We select p e Vd to ensure that such an to exists, and also to maximize the size of 
ElL,(F p ) relative to F p (with substantial benefit). 

To compute the isogenies required by Algorithms 1.2 and 1.3 we use the classical 
modular polynomial $jv e T\X,Y], which parametrizes elliptic curves connected 
by a cyclic isogeny of degree N. For a prime t ^ p and an elliptic curve E/¥ p , 
the roots of <f>e(X,j(E)) over F p are the j-invariants of all curves E' /¥ p connected 
to E via an isogeny of degree I (an i?-isogeny) [71, Thm. 12.19]. This gives us a 
computationally explicit way to define the graph of ^-isogenies on the set Ell t (F p ). 
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As shown by Kohel [46], the connected components of this graph all have a 
particular shape, aptly described in [29] as a volcano (see Figure 1 in Section 4). 
The curves in an isogeny volcano are naturally partitioned into one or more levels, 
according to their endomorphism rings, with the curves at the top level forming a 
cycle. Given an element of Ell t (F p ), Algorithm 1.2 finds an element of Elle>(F p ) by 
climbing a series of isogeny volcanoes. Given an element of Ello(F p ), Algorithm 1.3 
enumerates the entire set by walking along ^-isogeny cycles for various values of I. 

We now suppose we have computed Hd modulo primes p\ , . . . , p n and consider 
how to compute Hd mod P for an arbitrary positive integer P, using the Chinese 
Remainder Theorem. In order to do so, we need an explicit bound B on the largest 
coefficient of Hd (in absolute value). Lemma 8 of Appendix 1 provides such a B, 
and it satisfies \ogB = 0(\D\^ 2+e ). 

Let M — Y[pi, Mi — M/pi and Oj = mod p- L . Suppose c g Z is a coefficient 
of Hd- We know the values Cj = c mod pi and wish to compute c mod P for some 
positive integer P. We have 

(5) c = ^ c l a i M l mod M, 

and if M > 2B we can uniquely determine c. This is the usual CRT approach. 

Alternatively, if M is slightly larger, say M > 4B, we may apply the explicit 
CRT (mod P) [8, Thm. 3.1], and compute c mod P directly via 

(6) c = ^2 c i a i M i - rM mod p - 

Here r is the nearest integer to ^ Cidi/pi. When computing r it suffices to approx- 
imate each rational number Cidi/pi to within l/(4n). 

As noted in [27] , even when P is small one still has to compute Hd mod Pi for 
enough primes to determine Hd over Z, so the work required is essentially the 
same. The total size of the Cj over all the coefficients is necessarily as big as Hd- 

However, instead of applying the explicit CRT at the end of the computation, 
we update the sums ^CiOiMi mod P and ^Cidi/pi as each Cj is computed and 
immediately discard a. This online approach reduces the space required. 

We now give the complete algorithm to compute Hd mod P. When P is large 
we alter the CRT approach slightly as described in Section 7. This allows us to 
efficiently treat all P, including P = M, which is used to compute Hd over Z. 

Algorithm 2. Compute Hd mod P as follows: 

1. Select primes pi, . . . ,p n g Vd with f\pi > AB (Algorithm 2.1). 

2. Compute suitable presentations of c\(D) (Algorithm 2.2). 

3. Perform CRT prccomputation (Algorithm 2.3). 

4. For each pf. 

a. Compute the coefficients of Hd mod pi (Algorithm 1). 

b. Update CRT sums for each coefficient of Hd (Algorithm 2.4). 

5. Recover the coefficients of Hd mod P (Algorithm 2.5). 

The presentations computed by Algorithm 2.2 are used by Algorithm 1.3 to 
realize the action of the class group. The optimal presentation may vary with pi 
(more precisely, v(pi)), but often the same presentation is used for every pi. Each 
presentation specifies a sequence of primes l\ , . . . , Ik corresponding to a sequence 
ai, . . . , (Xk of generators for cl(D) in which each Qj contains an ideal of norm £i. 
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There is an associated sequence of integers ri , . . . , ru with the property that every 
/3 G cl(-D) can be expressed uniquely in the form 

with < Xi < Ti. Algorithm 1.3 uses isogenics of degrees £i, . . . ,£k to enumerate 
Ello(Fp). Given the large size of <&t{X, Y), roughly 0(£ 3 \og£) bits [21], it is critical 
that the £i are as small as possible. We achieve this by computing an optimal 
poly cyclic presentation for cl(D), derived from a sequence of generators for cl(D). 
Under the Extended Rcimann Hypothesis (ERH) we have £ t < 6 log 2 \D\, by [4]. 
This approach corrects an error in [5] which relies on a basis for cl(-D) and fails to 
achieve such a bound (see Section 5.3 for a counterexample). 

The rest of this paper is organized as follows: 

• Section 3 describes how we find a curve with trace ±t (Algorithm 1.1), 
and how the primes pi, . . . ,p n are selected (Algorithm 2.1). 

• Section 4 discusses isogeny volcanoes (Algorithms 1.2 and 1.3). 

• Section 5 defines an optimal polycyclic presentation of cl(D), 
and gives an algorithm to compute one (Algorithm 2.2). 

• Section 6 addresses the CRT computations (Algorithms 2.3, 2.4, and 2.5). 

• Section 7 contains a complexity analysis and proves Theorem 1. 

• Section 8 provides computational results. 

Included in Section 8 are timings obtained while constructing pairing-friendly curves 
of prime order over finite fields of cryptographic size. 

3. Finding an Elliptic Curve With a Given Number of Points 

Given a prime p and a positive integer t < 2^/p, we seek an element of Elh(F p ), 
equivalently, an elliptic curve E/F p with either No = p + I — t or Ni = p + 1 + t 
points. This is essentially the problem considered in the introduction, but since we 
do not yet know Hp, we cannot apply the CM method. 

Instead, we generate curves at random and test whether #E € {N ,Ni}, where 
#E is the cardinality of the group E(¥ p ). This test takes very little time, given the 
prime factorizations of N and Ni, and does not require computing #E. However, 
in the absence of any optimizations we expect to test many curves: 2^/p + 0(l), on 
average, for fixed p and varying t. Factoring Nq and Ni is easy by comparison. 

For the CRT-based algorithm in [5], searching for elements of Ell t (F p ) dominates 
the computation. In the example given there, this single step takes more than 50 
times as long as the entire computation of Hd using the floating-point method 
of [27] . We address this problem here in detail, giving both asymptotic and constant 
factor improvements. In aggregate, the improvements we suggest can reduce the 
time to find an element of Ell t (F p ) by a factor of over 100; under the heuristic 
analysis of Section 7.1 this is no longer the asymptotically dominant step. 

These improvements are enabled by a careful selection of primes p GVd, which 
is described in Section 3.3. Contrary to what one might assume, the smallest primes 
in Vd are not necessarily the best choices. The expected time to find an element 
of EU t (F p ) can vary dramatically from one prime to the next, especially when one 
considers optimizations whose applicability may depend on No and Ni. In order 
to motivate our selection criteria, we first consider how we may narrow the search 
by our choice of p, which determines t = t{p) and therefore Nq and Ni. 
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3.1. The density of curves with trace ±t. We may compute the density of 
ElL(Fp) as a subset of F p via a formula of Deuring [26]. For convenience we define 

m ( A H(4p-t 2 ) ^ #Ell t (F p ) 

where H(4p — t 2 ) is the Hurwitz class number (as in [18, Def. 5.3.6] or [23, p. 319]). 
A more precise formula uses weighted cardinalities, but the difference is negligible, 
see [23, Thm. 14.18] or [51] for further details. 

We expect to sample approximately l/p(p,t) random curves over F p in order to 
find one with trace ±t. When selecting primes p € Vd, we may give preference to 
primes with larger p- values. Doing so typically increase the average density by a 
factor of 3 or 4, compared to simply using the smallest primes vclVd- It also makes 
iVo and Ai more likely to be divisible by small primes, which interacts favorably 
with the optimizations of the next section. 

Using primes with large p-values improves the asymptotic results of Section 7 
by an 0(log \D\) factor. Effectively, we force the size of Ell t (F p ) to increase with p, 
even though the size of Ello(F p ) is fixed at h(D). This process tends to favor primes 
in Vd f° r which v(p) has many small factors, something we must consider when 
enumerating Ello(F p ) in Algorithm 1.3. 



3.2. Families with prescribed torsion. In addition to increasing the density of 
Ell t (F p ) relative to F p , we can further accelerate our random search by sampling a 
subset of F p in which Ell t (F p ) has even greater density. Specifically, we may restrict 
our search to a family of curves whose order is divisible by m, for some small m 
dividing Ao or Ni (ideally both). We have some control over Ao and .Ai via our 
choice of p G Vd, an d in practice we find we can easily arrange for Ao or N± to be 
divisible by a suitable m, discarding only a constant fraction of the primes in Vd 
we might otherwise consider (making the primes we do use slightly larger). 

To generate a curve whose order is divisible by m, we select a random point on 
Yi(m)/F p and construct the corresponding elliptic curve. Here Yi(m) is the affine 
subcurve of the modular curve Xi(m), which parametrizes elliptic curves with a 
point of order m. We do this using plane models F m (r,s) = that have been 
optimized for this purpose, see [65]. For m in the set {2, 3, 4, 5, 6, 7, 8, 9, 10, 12}, the 
curve Xi(m) has genus 0, and we obtain Kubert's parametrizations [47] of elliptic 
curves with a prescribed (cyclic) torsion subgroup over Q. Working in F p , we may 
use any m not divisible p, although we typically use m < 40, due to the cost of 
finding points on F m (r 1 s) = 0. 

We augment this approach with additional torsion constraints that can be quickly 
computed. For example, to generate a curve containing a point of order 132, it is 
much faster to generate several curves using Ai(ll) and apply tests for 3 and 4 tor- 
sion to each than it is to use Ai(132). A table of particularly effective combinations 
of torsion constraints, ranked by cost/benefit ratio, appears in Appendix 2. 

The cost of finding points on F m (r,s) — is negligible when m is small, but 
grows with the genus (more precisely, the gonality) of Ai(m), which is 0(m 2 ), by 
[42, Thm. 1.1]. For m < 23 the gonality is at most 4 (see Table 5 in [65]), and 
points on F m (r, s) can be found quite quickly (especially when the genus is or 1). 

Provided that we select suitable primes from Vd, generating curves with pre- 
scribed torsion typically improves performance by a factor of 10 to 20. 
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3.3. Selecting suitable primes. We wish to select primes in Vd that maximize 
the benefit of the optimizations considered in Sections 3.1 and 3.2. Our strategy is 
to enumerate a set of primes 

(8) S z = {p e V D : l/p(p, t(p)) < z} 

that is larger than we need, and to then select a subset S C S z of the "best" primes. 
We require that S be large enough to satisfy 

^lgp>6 = lgB + 2, 

pes 

where B is a bound on the coefficients of Hjj(X), obtained via Lemma 8, and "lg" 
denotes the binary logarithm. We typically seek to make S z roughly 2 to 4 times 
the size of S, starting with a nominal value for z and increasing it as required. 
To enumerate S z we first note that if Ap = t 2 — v 2 D for some p £ S z , then 

1 P = P < 

p(p,t) H(Ap-t 2 ) H(-v 2 D)- Z ' 

Hence for a given v, we may bound the p £ S z with v(p) = v by 

(9) p < zH{-v 2 D). 

To find such primes, we seek t for which p = (t 2 — v 2 D)/A is a prime satisfying (9). 
This is efficiently accomplished by sieving the polynomial f 2 — v 2 D, see [24. §3.2.6]. 
To bound v = v(p) for p e S z , we note that p > —v 2 D/A, hence 

(10) -v 2 D < 4zH(-v 2 D). 

For fixed z, this inequality will fail once v becomes too large. If we have 

(11) ? > UzH ^ 

1 ' (loglog(u + 4)) 2 - -D ' 

then (10) cannot hold, by Lemma 9 of Appendix 1. 

Example. Consider the construction of S z for D = —108708, for which we have 
H(-D) = h(D) = 100. We initially set z to -D/(2H(-D)) w 543. For v = 1 this 
yields the interval [-v 2 D/A, zH(-v 2 D)} = [-D/4,-D/2] = [27177,54354], which 
we search for primes of the form (t 2 — D)/4 by sieving t 2 — D with t < y/—2D, 
finding 17 such primes. For v = 2 we have H(—v 2 D) = 300 and search the interval 
[-D, -3D/2] = [108708, 163062] for primes of the form (t 2 - AD) /A, finding 24 of 
them. For v = 3 we have H(—v 2 D) = 400 and the interval [-9D/A, —2D] is empty. 
The interval is also empty for 3 < v < 39, and (11) applies to all v > 39. 

At this point S z is not sufficiently large, so we increase z, say by 50%, obtaining 
z ~ 814. This expands the intervals for v = 1,2 and gives nonempty intervals 
for v — 3,4, and we find an additional 74 primes. Increasing z twice more, we 
eventually reach z ~ 1831, at which point S z contains 598 primes with total size 
around 11911 bits. This is more than twice b = IgB + 2 w 5943, so we stop. The 
largest prime in S z is p = 5121289, with v(p) — 12. 

Once S z has been computed, we select S c S z by ranking the primes p G S z 
according to their cost /benefit ratio. The cost is the expected time to find a curve 
in Elh(Fp), taking into account the density p(p,t) and the m-torsion constraints 
applicable to Nq and Ni, and the benefit is Igp, the number of bits in p. Only a 
small set of torsion constraints are worth considering, and a table of these may be 
precomputed. See Appendix 2 for further details. 
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The procedure for selecting primes is summarized below. We assume that h(D) 
has been obtained in the process of determining B and b = \gB + 2, which allows 
H(—D) and p(p,t) to be easily computed (see (26) and (27) in Appendix 1). 

Algorithm 2.1. Given D, b, and parameters k > 1, S > 0, select S C Vd- 

1. Let z = -D/(2H(-D)). 

2. Compute S z = {peV D : l/p(p,t(p)) < z}. 

3- If S P es ^SP < kb, then set z (1 + (5)z and go to Step 2. 

4. Rank the primes in S z by increasing cost/benefit ratio as p\, . . . ,p nz . 

5. Let S = {pi, . . . ,p n }, with n < n z minimal subject to IgP > 

In Step 3 we typically use k — 2 or k = 4 (a larger fc may find better primes), 
and 5 = 1/2. The complexity of Algorithm 2.1 is analyzed in Section 7, where it 
is shown to run in expected time Od-DI 1 / 2 " 1 ^), under the GRH (Lemma 4). This is 
negligible compared to the total complexity of 0(\D\ 1+e ) and very fast in practice. 

In the D — —108708 example above, Algorithm 2.1 selects 313 primes in S z , the 
largest of which is p = 4382713, with v = 12 and t = 1370. This largest prime is 
actually a rather good choice, due to the torsion constraints that may be applied 
to N = p + 1 — t, which is divisible by 3, 4, and 11. We expect to test the orders 
of fewer than 40 curves for this prime, and on average need to test about 60 curves 
for each prime in S, fewer than 20,000 in all. 

For comparison, the example in [5, p. 294] uses the least 324 primes in Vd, the 
largest of which is only 956929, but nearly 500,000 curves are tested, over 1500 
per prime. The difference in running times is even greater, 0.2 seconds versus 18.5 
seconds, due to optimizations in the testing algorithm of the next section. 

3.4. Testing curves. When p is large, the vast majority of the random curves we 
generate will not have trace ±i, even after applying the optimizations above. To 
quickly filter a batch of, say, 50 or 100 curves, we pick a random point P on each 
curve and simultaneously compute (p+l)P and tP. Here we apply standard multi- 
exponentiation techniques to scalar multiplication in E(¥ p ), using a precomputed 
NAF representation, see [20, Ch. 9]. We perform the group operations in parallel 
to minimize the cost of field inversions, using affinc coordinates as in [45, §4.1]. We 
then test whether (p + 1)P = ±tP, as suggested in [5], and if this fails to hold we 
reject the curve, since its order cannot be p + 1 ± t. 

To each curve that passes this test, we apply the algorithm TestCurveOrder. 
In the description below, H p = \p+ 1 — 2^/p.p+l + 2^/p] denotes the Hasse interval, 
and the index s G {0, 1} is used to alternate between E and its quadratic twist E. 

Algorithm TestCurveOrder. Given an elliptic curve E/¥ p and factored inte- 
gers N , Ni G H p with N < Ni and N + Ni = 2p + 2: 

1. If p < 11, return true if #E £ {N$,Ni} and false otherwise. 

2. Set E E, Ei E, m 4- 1, mi «- 1, and s 0. 

3. Select a random point P € E s (¥ p ). 

4. Use FastOrder to compute the order n s of the point Q = m s P, assuming 
n s divides N s /m s . If this succeeds, set m s <s— m s n s and proceed to Step 5. 
If not, provided that m \N 1 , mi\N , and N < Ni, swap N and Ni and go 
to Step 3, but otherwise return false. 

5. Set a\ 2p + 2 mod m\ and M <— {mox : x € Z}n{mix + ai : x S Z}n"H p . 
If J\f C {Nq, Ni} return true, otherwise set s 1 — s and go to Step 3. 
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TestCurveOrder computes integers m s dividing #-E s by alternately comput- 
ing the orders of random points on E and E. If an order computation fails (this 
happens when n s \ N s /m s ), it rules out A s as a possibility for j^E. If both No 
and Ni are eliminated, the algorithm returns false. Otherwise a divisor n s of N s is 
obtained and the algorithm continues until it narrows the possibilities for #E to a 
nonempty subset of {Ao, Ai} (it need not determine which). The set M computed 
in Step 5 must contain #E, since mo divides #E and mi divides #E (the latter 
implies #E = 2p + 2 mod mi, since #_E + #_E = 2p + 2). The complexity of the 
algorithm (and a proof that it terminates) is given by Lemma 6 of Section 7. 

A simple implementation of FastOrder appears below, based on a recursive al- 
gorithm to compute the order of a generic group element due to Celler and Leedham- 
Green [16]. By convention, generic groups are written multiplicatively, and we do 
so here, although we apply FastOrder to the additive groups E(W P ) and E(¥ p ). 
The function lo(N) counts the distinct prime factors of N. 

Algorithm FastOrder. Given an element a of a generic group G and a factored 
integer N , compute the function A(a, N), defined to be the factored integer M = \a\ 
when M divides N, and otherwise. 

1. If TV is a prime power p n , compute a p for increasing i until the identity is 
reached (in which case return p l ) 1 or i = n (in which case return 0). 

2. Let N = AiA 2 with Ai and A 2 coprime and \u(N{) - u(N 2 )\ < 1. 
Recursively compute M = A(a N2 ,N\)- A(a Nl , N2) and return M. 

This algorithm uses C>(log A log log N) multiplications (and identity tests) in G. A 
slightly faster algorithm [64, Alg. 7.4] is used in the proof of Theorem 1. In practice, 
the implementation of TestCurveOrder and FastOrder is not critical, since 
most of the time is actually spent performing the scalar multiplications discussed 
above (these occur in Step 3 of Algorithm 1.1 below). 

We now give the complete algorithm to find an element of Ell t (F p ). For reasons 
discussed in the next section, we exclude the j-invariants and 1728. 

Algorithm 1.1. Given p e V D , find j e Ell t (F p ) - {0,1728}. 

1. Factor Ao = p + 1 — t and N\ = p + 1 + t, and choose torsion constraints. 

2. Generate a batch of random elliptic curves Ei/¥ p with j(Ei) {0,1728} 
that satisfy these constraints and pick a random point Pi on each curve. 

3. For each i with (p + l)Pj = ±iPj, test whether #^ e {A , Ni} by calling 
TestCurveOrder, using the factorizations of A and N±. 

4. If #Ei e {A , Ai} for some i, output j(Ei), otherwise return to Step 2. 

The torsion constraints chosen in Step 1 may be precomputed by Algorithm 2.1 
in the process of selecting S C Vr>- In Step 2 we may generate Ei with m-torsion 
as described in Section 3.2; as a practical optimization, if Ai(m) has genus we 
generate both Ei and P, using the parametrizations in [3]. In Step 3 the point Pj 
can also be used as the first random point chosen in TestCurveOrder. The 
condition (p+l)Pj = ±tPi is tested by performing scalar multiplications in parallel, 
as described above; when torsion constraints determine the sign of t, we instead 
test whether (p + 1 — t)Pi = or (p + 1 + t)Pi = 0, as appropriate. 
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4. Isogeny Volcanoes 

The previous section addressed the first step in computing Hd mod p: finding an 
element of Ell t (F p ). In this section we address the next two steps: finding an element 
of Elle>(Fp) and enumerating Elle>(F p ). This yields the roots of Hr> mod p. We 
utilize the graph of £-isogenies defined on Ell t (F p ). We regard this as an undirected 
graph, noting that the dual isogeny [61, §111.6] lets us traverse edges in cither 
direction. We permit self-loops in our graphs but not multiple edges. 

Definition 1. Let £ be prime. An l-volcano is an undirected graph with vertices 
partitioned into levels Vq, ■ ■ ■ , Vd, in which the subgraph on Vq (the surface) is a 
regular connected graph of degree at most 2, and also: 

1. For i > 0, each vertex in Vi has exactly one edge leading to a vertex in V$-i, 
and every edge not on the surface is of this form. 

2. For i < d, each vertex in Vi has degree £+1. 

The surface Vo of an ^-volcano is either a single vertex (possibly with a self- loop), 
two vertices connected by an edge, or a (simple) cycle on more than two vertices, 
which is the typical case. We call Vd the floor of the volcano, which coincides with 
the surface when d = 0. For d > the vertices on the floor have degree 1, and in 
every case their degree is at most 2; all other vertices have degree £ + 1 > 2. 

We refer to d as the depth of the I- volcano. The term "height" is also used [54], 
but "depth" better suits our indexing of the levels Vi and is consistent with [46] . 



Figure 1. A 3- volcano of depth 2, with a 4-cycle on the surface. 

Definition 2. For a prime £ p, let Ff it (F p ) be the undirected graph with vertex 
set Ell t (F p ) that contains the edge (ji, J2) if and only if $^(ji,j2) = 0. 

Here $£ denotes the classical modular polynomial. With at most two exceptions, 
the components of T£. t (F p ) are £- volcanoes. The level at which j(E) £ Ell f (F p ) 
resides in its ^-volcano is determined by the power of £ dividing the conductor of 
End(.E). 

The discriminant D may be written as D — u 2 Dk, where Dk is the discriminant 
of the maximal order Ok containing O, and u — [Ok ■ O] is the conductor of O. 
We also have the discriminant 

(12) Dk = t 2 - Ap = v 2 D = w 2 D K 
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of the order Z[7r] C Ok with conductor w — uv, generated by the Frobcnius 
cndomorphism ir with trace t (note it — he for all j(E) <G Ellt(F p )). The order 
O contains Z[tt], and for any j(E) G Ell t (F p ) we have Z[tt] C End(E) C O k . 
Curves with End(_E) = Z[n] lie on the floor of their ^-volcano, while those with 
End(_E) = Ok he on the surface. More generally, the following proposition holds. 

Proposition 2. Let p e Vd an d let I ^ p be a prime. The components ofTg tt (F p ) 
that do not contain j = 0, 1728 are t-volcanoes of depth d = u^(w). Each has an 
associated order Oq, with 7L\ji\ CO C Ok and t \ [Ok ■ Oq], and we have 

j(E) eV t End(£) S Oi, 

where Oi is the order of index l % in Oq . 

Here vg denotes the £-adic valuation (so £ d \w but £ d+1 \ w). The proposition 
follows essentially from [46, Prop. 23]. See [29, Lemmas 2.1-6] for additional details 
and [71, Thm. 1.19, Prop. 12.20] for properties of $£. 

We have excluded j = 0, 1728 (which can arise only when Dk = —3, —4) for 
technical reasons, sec [71, Rem. 12.21]. However a nearly equivalent statement 
holds; only the degrees of the vertices and 1728 are affected. 

4.1. Obtaining an element of Ell (F p ). Given j(E) e EU t (F p ) - {0, 1728}, we 
may apply Proposition 2 to obtain an element of Ello(F p ). Let u and ue be the 
conductors of O and End(S) respectively; both u and ue divide w, the conductor of 
= t 2 — Ap. Suppose vi{ue) ^ ^e(u) for some prime I. If we replace j = j(E) by 
a vertex at level vg{u) in j's ^-volcano, we then have vi(ue) = vi(u). Proposition 2 
assures us that this "adjustment" only affects the power of £ dividing ue- Repeating 
this for each prime £\w, we eventually have ue = u and j(E) e Ello(F p ). 

To change location in an ^-volcano we walk a path, which we define to be a 
sequence of vertices jo,... ,j n connected by edges (jk,jk+i), such that j fc -i 7^ jk+i 
for all < k < n (this condition is enforced by never taking a backward step). 

Paths in r^ ;t (F p ) are computed by choosing an initial edge (jo,ji), and for k > 
extending the path jo, ■ ■ ■ ,jk by picking a root jk+i of the polynomial 

f(X) = <t> e (X,j k )/(X - j k -!) e € F p [x]. 

Here e is the multiplicity of the root jk-i in <&i{X,jk), equal to one in all but a 
few special cases (see [29, Lemma 2.6 and Thm. 2.2]). If f{X) has no roots in F p , 
then jk has no neighbors other than jk-i and the path must end at jk- 

When a path has jk € Vi and jk+i € V^+i, we say the path descends at k. Once 
a path starts descending, it must continue to do so. If a path descends at every 
step and terminates at the floor, we call it a descending path, as in [29, Def. 4.1]. 

We now present an algorithm to determine the level of a vertex j in an £- volcano, 
following Kohel [46, p. 46]. When walking a path, we suppose neighbors are picked 
uniformly at random whenever there is a choice to be made. 

Algorithm FindLevel. Compute the level of j in an i-volcano of depth d: 

1. If deg(j) 7^ t + 1 then return d, otherwise let j\ ^ ji be neighbors of j. 

2. Walk a path of length k\< d extending (j,ji). 

3. Walk a path of length fc 2 < &i extending (j, j 2 ). 

4. Return d — k 2 . 
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If FindLevel terminates in Step 1, then j is on the floor at level d. The paths 
walked in Steps 2 and 3 are extended as far as possible, up to the specified bound. 
If j is on the surface, then these paths both have length d, and otherwise at least 
one of them is a descending path of length fc 2 . In both cases, j is on level d — k 2 . 

We use the algorithms below to change levels in an ^-volcano of depth d > 0. 

Algorithm Descend. Given j e V k ^ V d , return j' e V k +\- 

1. If k — 0, walk a path (j = j , . . . ,j n ) to the floor and return j' = j n -d+i- 

2. Otherwise, let ji and j'2 be distinct neighbors of j. 

3. Walk a path of length d — k extending and ending in j*. 

4. If dcg(j*) = 1 then return j' = ji, otherwise return j' = j2- 

Algorithm Ascend. Given j e Vk ^ V , return j' e Vk-V 

1. If dcg(j) = 1 then let j' be the neighbor of j and return j', 
otherwise let ji, ■ ■ ■ ,je+i be the neighbors of j. 

2. For each i from 1 to I: 

a. Walk a path of length d — k extending (j, ji) and ending in j* . 

b. If deg(j*) > 1 then return j' — ji. 

3. Return j' = j e+1 . 

The correctness of Descend and Ascend is easily verified. We note that if k = 
in Descend, then the expected value of n is at most d + 2 (for any £). 

We now give the algorithm to find an element j' £ Elle>(F p ), given j e Ell t (F p ). 
We use a bound L on the primes £\w, reverting to a computation of the endomor- 
phism ring to address I > L, as discussed below. This is never necessary when D 
is fundamental, but may arise when the conductor of D has a large prime factor. 

Algorithm 1.2. Let p e Vd, let u be the conductor of D, and let w = uv, where 
v = v(p). Given j e Ell t (F p ) - {0, 1728}, find f € Ell (F p ): 

1. For each prime I \w with I < L = max(log \D\, v): 

a. Use FindLevel to determine the level of j in its £- volcano. 

b. Use Descend and Ascend to obtain j' at level ve{u) and set j <- j'. 

2. If u is not i-smooth, verify that j € Ello(F p ) and abort if not. 

3. Return j' = j. 

The verification in Step 2 involves computing End(£?) for an elliptic curve E/¥ p 
with j{E) = j. Here we may use the algorithm in [10], or Kohel's algorithm [46]. 
The former is faster in practice (with a heuristically subexponential running time) 
but for the proof of Theorem 1 we use the O^p 1 ^) complexity bound of Kohel's 
algorithm, which depends only on the GRH. 

For p <G S, we expect v to be small, 0(log 3+e \D\) under the GRH, and heuristi- 
cally OQog 1 / 2 \D\). Provided u docs not contain a prime larger than L, the running 
time of Algorithm 1.2 is polynomial in log \D\, under the GRH. 

However, if u is divisible by a prime t > L, we want to avoid the cost of computing 
I- isogenics. Such an I cannot divide v (since L > w), so our desired j' must lie on 
the floor of its ^-volcano. When I is large, it is highly probable that our initial j is 
already on the floor (this is where most of the vertices in an £- volcano lie) , and this 
will still hold in Step 2. Since L > log \D\ is asymptotically larger than the number 
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of prime factors of u, the probability of a failure in Step 2 is o(l). If Algorithm 1.2 
aborts, we call Algorithm 1.1 again and retry. 

If Dk is —3 or —4, then j may lie in a component of T( t t(¥ p ) containing or 1728. 
However, provided we never pick or 1728 when choosing a neighbor, FindLevel, 
Descend, and Ascend will correctly handle this case. 

4.2. Enumerating Ello(F p ). Having obtained j € Ello(F p ), we now wish to 
enumerate the rest of Elle>(F p ). We assume h(D) > 1 and apply the group action 
of cl(.D) to the set Elle>(F p ). Let I be a prime not dividing the conductor u of D 
with (-j) 7^ —1. Then £ can be uniquely factored in O into conjugate prime ideals 
as (I) = aa, where a and a both have norm I. The ideals a and a are distinct when 
(y) = 1, and in any case the ideal classes [o] and [a] are inverses. The orders of 
[a] and [6] in cl(D) are equal, and we denote their common value by ord£>(^). The 
following proposition follows immediately from Propositions 1 and 2. 

Proposition 3. Let i ^ p be a prime such that l\u and (-j) ^ — 1. Then every 
element o/Ello(F p ) lies on the surface Vq of its l-volcano, and #Vo = ord£>(£). 

If ordu(^) = h(D), then Ello(F p ) is equal to the surface of the ^-volcano contain- 
ing j 7 but in general we must traverse several volcanoes to enumerate Elle>(F p ). 
We first describe how to walk a path along the surface of a single ^-volcano. 

When I docs not divide v, every ^-volcano in r> jt (F p ) has depth zero. In this 
case walking a path on the surface is trivial: for #Vo > 2 we choose one of the two 
roots of $g(X,jo), and every subsequent step is determined by the single root of 
the polynomial f(X) = <&g(X,ji)/(X — ji-i). The cost of each step is then 

(13) 0(^ 2 + M(^)logp) 

operations in F p , where M(n) is the complexity of multiplication (the first term is 
the time to evaluate &e(X, ji), the second term is the time to compute X p mod /). 

While it is simpler to restrict ourselves to primes I \ v (there are infinitely many I 
we might use), as a practical matter, the time spent enumerating Elle>(F p ) depends 
critically on I. Consider I = 2 versus 1 = 1. The cost of finding a root of f(X) 
when / has degree 7 may be 10 or 20 times the cost when / has degree 2. We much 
prefer I = 2, even when the 2- volcano has depth d > (necessarily the case when 
(-j) = 1). The following algorithm allows us to handle I- volcanoes of any depth. 
Algorithm WalkSurfacePath. Given jo € Vo in an (-volcano of depth d and a 
positive integer n < #Vo, return a path jo,ji ■ ■ ■ , j n contained in Vq: 

1. If deg(jo) = 1 then return the path jo,ji, where j\ is the neighbor of jo- 
Otherwise, walk a path jo, ... , jd and set i <— 0. 

2. While deg(j l+d ) = 1, replace . . 7 j l+d by extending the path j , . . . , ji 
by d steps, starting from a random unvisited neighbor j' i+1 of ji. 

3. Extend the path j , . . .,j i+ d to j , . . .,j i+d+1 , then set i <- i + 1. 

4. If i = n then return j , . . . , j„, otherwise go to Step 2. 

When d = the algorithm necessarily returns a path that is contained in Vq. 
Otherwise, the path extending d + 1 steps beyond ji € Vq in Step 3 guarantees that 
ji+i € Vo- The algorithm maintains (for the current value of i) a list of visited 
neighbors of ji to facilitate the choice of an unvisited neighbor in Step 2. 

To bound the expected running time, we count the vertices examined during its 
execution, that is, the number of vertices whose neighbors are computed. 
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Proposition 4. Let the random variable X be the number of vertices examined by 
WalkSurfacePath. If #V = 2 then ELY] = d+ 1 + id/2, and otherwise 

ELY] <d+(l + (£-l)d/2)n. 

Proof. If d = then WalkSurfacePath examines exactly n vertices and the 
proposition holds, so we assume d > and note that deg(jo) > 1 in this case. We 
partition the execution of the algorithm into phases, with phase -1 consisting of 
Step 1, and the remaining phases corresponding to the value of i. At the start of 
phase i > we have ji £ Vb and the path j , ■ ■ ■ , ji+d- Let the random variable Xi 
be the number of vertices examined in phase i, so that X = X_i + Xq + • • • + X n . 
We have A_i = d and X n = 0. For < i < n we have Xi = 1 + md, where m 
counts the number of incorrect choices of ji + i (those not in Vb). 

We first suppose #Vb = 2. In this case exactly one of the £ + 1 neighbors of jo 
lies in Vq. Conditioning on m we obtain 

m=0 fc=0 V 7 m=0 

This yields 

ELY] - ELY_ X ] + ELY ] + ELYi] = d + 1 + ld/2, 

as desired. We now assume #Vb > 2. Then two of j ' s neighbors lie in V and we 
find that ELYo] = 1 + (I — l)d/3. For i > 1 we exclude the neighbor of and 
obtain E[A^] = 1 + (£ — l)d/2. Summing expectations completes the proof. □ 

Using an estimate of the time to find the roots of a polynomial of degree £ in 
FpLY], we may apply Proposition 4 to optimize the choice of the primes I that we 
use when enumerating Elle>(F p ), as discussed in the next section. As an example, 
if (-j) = 1 and vi(y) = 2, then we need to solve an average of roughly 2 quadratic 
equations for each vertex when we walk a path along the surface of a 2-volcano in 
r£,t(F p ). This is preferable to using any £ > 2, even when £ \ v. On the other hand, 
if (f ) = (f ) = 1 and 5\v but 7 \ v, we likely prefer £ = 7 to £ = 5. 

We now present Algorithm 1.3, which, given jo <G Ellei(Fp) and suitable lists of 
primes li and integers r i; outputs the elements of Elle>(F p ) — {jo}- It may be viewed 
as a generalization of WalkSurfacePath to k dimensions. 

Algorithm 1.3. Given j € Ello(F p ), primes £\, . . . ,£k with £i\u and ( j-) ^ —1, 
and integers n, . . . ,rfe, with 1 < < ord£)(^): 

1. Use WalkSurfacePath to compute a path j , ji , . . . , j rk -i of length rk — 1 
on the surface of the £k- volcano containing jo, and output ji, . . . , j rk -\- 

2. If fc > 1 then for i from to r^ — 1 recursively call Algorithm 1.3 using j i7 
the primes £\, . . . ,£k-i, and the integers n, . . . , r^-i. 

Proposition 2 implies that Algorithm 1.3 outputs a subset of Elle>(F p ), since 
jo, ji, . . . , jr fc -i all lie on the surface of the same ffc-volcano (and this applies recur- 
sively). To ensure that Algorithm 1.3 outputs all the elements of Ello(F p ) — {jo}, 
we use a poly cyclic presentation for c\(D), as defined in the next section. 



COMPUTING HILBERT CLASS POLYNOMIALS WITH THE CRT 



15 



5. Polycyclic Presentations of Finite Abelian Groups 

To obtain suitable sequences l\, . . . , Ik and n . . . ,rk for use with Algorithm 1.3, 
we apply the theory of polycyclic presentations [37, Ch. 8]. Of course cl(Z?) is a 
finite abelian group, but the concepts we need have been fully developed in the 
setting of polycyclic groups, and conveniently specialize to the hnite abelian case. 

Let a. — (ai, . . . , a k ) be a sequence of generators for a finite abelian group G, 
and let Gi — (a\, . . . , an) be the subgroup generated by ct\, . . . , ai. The series 

1 = Go < Gi < ■ ■ ■ < G fc _! < G k = G, 

is necessarily a polycyclic series, that is, a subnormal series in which each quotient 
Gi/d-i is a cyclic group. Indeed, d/d-i = (aid-i), and a is a polycyclic 
sequence for G. We say that a is minimal if none of the quotients are trivial. 

When G — Y\(oti), we have d/d-i = (ai) and call a a basis for G, but this 
is a special case. For abelian groups, Gi/Gi-i is isomorphic to a subgroup of (ai), 
but it may be a proper subgroup, even when a is minimal. 

The sequence r(a) = (n, . . . , r^) of relative orders for a. is defined by 

n = \Gi : Gi_i|. 

We necessarily have \\ r i = |G|, and if o; is minimal then ri > 1 for all z. The 
sequences a and r(a) allow us to uniquely represent every (5 G G in the form 

/3 = a x =af •••a^ fc . 

Lemma 1. Let a = (on, . . . , a k ) be a sequence of generators for a finite abelian 
group G, let r(a) = (n, . . . ,rk), and let X(a) = {x £ Z' : < i, < r^}. 

1. For each (3 G G there is a unique x G X(c*) smc/i t/iot f3 = a x . 

2. T/ie vector x such that a^ = a x has Xj = for j > i. 

Proof. See Lemmas 8.3 and 8.6 in [37]. □ 

The vector x is the discrete logarithm (exponent vector) of /? with respect to a. 
The relations = ol x are called power relations, and may be used to define a 
(consistent) polycyclic presentation for an abelian group G, as in [37, Def. 8.7]. 

We now show that a minimal polycyclic sequence for cl(D) provides suitable 
inputs for Algorithm 1.3. 

Proposition 5. Let a. = (a\, . . . ,a k ) be a minimal polycyclic sequence for cl(D) 
with relative orders r(a) = (n, . . . , r k ), and let l\, . . . ,£k be primes for which ai 
contains an invertible ideal of norm li. Given jo G Elle>(F p ) ; the primes li, and the 
integers ri, Algorithm 1.3 outputs each element o/Ello(F p ) — {jo} exactly once. 

Proof. As previously noted, Proposition 2 implies that the outputs of Algorithm 1.3 
are elements of Ell (F p ). Since Yl r i — # C K £) ) = #Elle>(Fp), by Proposition 1, 
we need only show that the outputs are distinct (and not equal to jo). 

To each vertex of the isogeny graph output by Algorithm 1.3 we associate a 
vector x G X(a) that identifies its position relative to j hi the sequence of paths 
computed. The vector (x\, . . . , Xk) identifies the vertex reached from jo via a path 
of length Xk on the surface of the ^-volcano, followed by a path of length Xk-i on 
the surface of the £fe_i-volcano, and so forth. We associate the zero vector to j . 

Propositions 1 and 2 imply that the vector x = (xi, . . . ,Xk) corresponds to the 
action of some f3 x G c\(D). For each integer tk in the interval [0,r/c), the set of 
vectors of the form (*,... ,*,tk) corresponds to a coset of Gfc-i in the polycyclic 
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series for G — cl(D). These cosets are distinct, regardless of the direction chosen 
by Algorithm 1.3 when starting its path on the £fc-volcano (note that and a^ 1 
have the same relative order r^). Proceeding inductively, for each choice of inte- 
gers . . . ,tfc with tj £ [0,r"j) for i < j < fc, the set of vectors of the form 
(*,...,*, ti, tj+i, . . . , tk) corresponds to a distinct coset of Gj_i, regardless of the 
direction chosen by Algorithm 1.3 on the surface of the ^-volcano. Each coset of the 
cyclic group Go corresponds bijectively to a set of vectors of the form (*, ti, . . . , tk)- 
It follows that the (3 X are all distinct. The action of c\(D) is faithful, hence the 
outputs of Algorithm 1.3 are distinct. □ 

5.1. Computing an optimal polycyclic presentation. Let 7 = (71, . . . ,j n ) be 
a sequence of generators for a finite abelian group G, ordered by increasing cost 
(according to some cost function). Then 7 is a polycyclic sequence, and we may 
compute r(-y) = (n, . . . , r n ). If we remove from 7 each 7$ for which r» = 1 and let 
ol = (ai, . . . , afc) denote the remaining subsequence, then a is a minimal polycyclic 
sequence for G. We call a the optimal polycyclic sequence derived from 7. It has 
d\ = 7i with minimal cost, and for i > 1 each on is the least-cost element not 
already contained in = (a±, . . . , a,_i). 

We now give a generic algorithm to compute r("y) and a vector 3(7) that encodes 
the power relations. From r(7) and s(7), we can easily derive a,r(a), and s(a). 
We define s(-y) using a bijection A" (7) ->{zeZ:0<z< \G\} given by: 

(14) Z(aj) = N 3 X J> whcre = I! n - 

l<j<n l<i<j 

For each power relation 7^ = 7 X , we set Sj = Z(x). The formula 

(15) = [si/Njj mod 

recovers the component Xj of the vector x for which s 4 = Z(x). 
Algorithm 2.2. Given 7 = (71, . . . ,7«) generating a finite abelian group G: 

1. Let T be an empty table and call TableInsert(T, 1 g ) (so T[0] = 1 G )- 

2. For z from 1 to n: 

3. Set /3 7i, n <- 1, and A <- TableSize(T). 

4. Until Sj <- TableLookup(T, /3) succeeds: 

5. For j from to N - 1: TableInsert(T, /3 • T\j]). 

6. Set /? /?7i and rj + 1. 

7. Output r(7) = (n,.. .,r„) and s(7) = (si, . . . , s n ). 

The table T stores elements of G in an array, placing each inserted element in 
the next available entry. The function TableLookup(T, (3) returns an integer j 
for which T[j] = (3 or fails if no such j exists (when j exists it is unique). In 
practice lookups are supported by an auxiliary data structure, such as a hash table, 
maintained by TableInsert. When group elements are uniquely identified, as with 
cl(-D), the cost of table operations is typically negligible. 

Proposition 6. Algorithm 2.2 is correct. It uses \G\ non-trivial group operations, 
makes \G\ calls to TableInsert, and makes J2 r i ca ^ s <0 TableLookup. 

Proof. We will prove inductively that T[Z(x)] = ~f x , and that each time the loop 
in Step 4 terminates, the values of rj and Sj are correct and T holds d. 
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When i = l the algorithm computes T\r{\ = 7[ 1 T[0] for n = 1,2,..., until 
7p = T[0] = 1, at which point n = |-y» | , si = 0, and T holds Gi, as desired. 

For i > 1 we have TV = A t _i and T holds G l - l with T[Z(x)] = 7 X , by the 
inductive hypothesis. For n = 1,2,..., if j3 = 7*** is not in T, the algorithm 
computes T[riN + j] = 7pT[j], for < j < N, placing the coset 7pGi_i in T. 
When it finds 7^ = T[si], the table T contains all cosets of the form 7[ 4 Gj_i (since 
G is abelian), hence T holds Gj. It follows that = |Gj : Gj_i| and is correct. 

When the algorithm terminates, T holds G„ = G, and every element of G is in- 
serted exactly once. A group operation is performed for each call to TableInsert, 
but in each execution of Step 5 the first of these is trivial, and we instead count 
the non-trivial group operation in Step 6. The number of calls to TableLookup 
is clearly the sum of the which completes the proof. □ 

The complexity of Algorithm 2.2 is largely independent of 7. When 7 contains 
every element of G, Algorithm 2.2 is essentially optimal. However, if 7 has size 
n = odGI 1 / 2 ), we can do asymptotically better with an O^G] 1 ^ 2 ) algorithm. 
This is achieved by computing a basis a. for G via a generic algorithm (as in 
[14, 64, 66, 67]), and then determining the representation of each 7$ = a x in this 
basis using a vector discrete logarithm algorithm (such as [64, Alg. 9.3]). It is then 
straightforward to compute |G,| for each i and from this obtain rj = |Gj : Gj_i|. 
The power relations can then be computed using discrete logarithms with respect 
to 7. In the specific case G = cl(D), one may go further and use a non-generic 
algorithm to compute a basis a in subexponential time (under the ERH) [34] , and 
apply a vector form of the discrete logarithm algorithm in [69] . 

5.2. Application to cl(D). For the practical range of D, the group G = cl(D) is 
relatively small (typically |G| < 10 s ), and the constant factors make Algorithm 2.2 
faster than alternative approaches; even in the largest examples of Section 8 it 
takes only a few seconds. Asymptotically, Algorithm 2.2 uses 0(\D\ 1 / 2+e ) time and 
OQD] 1 / 2 log 2 \D\) space to compute an optimal polycyclic sequence for cl(-D). In 
fact, under the GRH, we can compute a separate polycyclic sequence for every v(p) 
arising among the primes p <G S that are selected by Algorithm 2.1 (Section 3.3) 
within the same complexity bound, by Lemma 3 (Section 7). 

We uniquely represent elements of cl(D) with primitive, reduced, binary qua- 
dratic forms ax 2 + bxy + cy 2 , where a corresponds to the norm of a reduced ideal 
representing its class. For the sequence 7 we use forms with a = I prime, con- 
structed as in [15, Alg. 3.3]. Under the ERH, restricting to I < 6 log 2 \D\ yields 
a sequence of generators for cl(D), by [4]. To obtain an unconditional result, we 
precompute h(D) and extend 7 dynamically until Algorithm 2.2 reaches N = h(D). 

We initially order the elements 7$ of 7 by their norm £i, assuming that this 
reflects the cost of using the action of 7$ to enumerate Elle>(F p ) via Algorithm 1.3 
(Section 4.2). However, for those li that divide v(p) we may wish to adjust the 
relative position of 7$, since walking the surface of an fi-volcano with nonzero 
depth increases the average cost per step. We use Proposition 4 to estimate this 
cost, which may or may not cause us to change the position of 7$ in 7. In practice 
just a few (perhaps one) distinct orderings suffice to optimally address every v(p). 

Note that we need not consider the relative orders ?*j when ordering 7. If i is less 
than j, then Algorithm 1.3 always takes at least as many steps using £i as it does 
using ij. Indeed, the running time of Algorithm 1.3 is typically determined by the 
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choice of a\: at least half of the steps will be taken on the surface of an .^-volcano, 
and if (^) = 1, almost all of them will (heuristically). 

5.3. Why not use a basis? Using a basis to enumerate Ello(F p ) is rarely optimal, 
and in the worst case it can be a very poor choice. The ERH does imply that cl(D) 
is generated by the classes of ideals with prime norm I < 6 log 2 \D\, but this set of 
generators need not contain a basis. As a typical counterexample, consider 

D 1 = -10007- 10009- 10037, 

the product of the first three primes greater than 10000. The class group has order 
h(Di) = 2 2 - 44029, where 44029 is prime, and its 2-Sylow subgroup H is isomorphic 
to Z/2Z x Z/2Z. Every basis for cl(Di) must contain a non-trivial element of H, 
and these classes have reduced representatives with norms 10007, 10009, and 10037, 
all of which are greater than 6 log 2 |£>i| w 4583. 

By comparison, Algorithm 2.2 computes an optimal polycyclic sequence for 
cl(£>i) with £i = 5 and £ 2 = 37 (and relative orders n = 88058 and r 2 = 2). 

6. Chinese Remaindering 

As described in Section 2, for each coefficient c of the Hilbert class polynomial 
we may derive the value of c mod P (for any positive integer P) from the values 
Ci = c mod pi appearing in Hp mod pi (for pi e S) , using an explicit form of the 
Chinese Remainder Theorem (CRT). We apply 

(6) c = ^2 c i a i M i - rM mod p , 

where M — ]Jpi, Mi — M/p i7 en — M^ 1 modpi, and r is the closest integer to 
s = J2cidi/pi. Recall that S C Vd is chosen so that M > AB, where B bounds 
the coefficients of Hd, via Lemma 8. It suffices to approximate each term in the 
sum s to within l/(4n), where n = #S. If p M denotes the largest pi, we need 
0(log(n(p M + logn))) = 0(\ogp M ) bits of precision to compute r. 

To minimize the space required, we accumulate C = ^CidiMi mod P and an 
approximation of s as the c» are computed. This uses <3(logP + logp M ) space per 
coefficient. We have h(D) coefficients to compute, yielding 

(16) 0(h(D)(logP + logp M )) 

as our desired space bound. 

To achieve this goal without increasing the time complexity of our algorithm, we 
consider two cases: one in which P is small, which we take to mean 

(17) logP < ^ilog 3 \D\, 

for some absolute constant [i, and another in which P is large (not small). The 
former case is typical when applying the CM method; P may be a cryptographic- 
size prime, but it is not unreasonably large. The latter case most often arises when 
we actually want to compute Ho over Z. When P > M there is no need to use the 
explicit CRT and we apply a standard CRT computation. To treat the intermediate 
case, where P is large but smaller than M, we use a hybrid approach. 

The optimal choice of /i depends on the relative cost of performing h(D) multi- 
plications modulo P versus the cost of computing Hp mod p^, we want the former 
to be small compared to the latter. In practice, the constant factors allow us to 
make \i quite large and the intermediate case rarely arises. 



COMPUTING HILBERT CLASS POLYNOMIALS WITH THE CRT 



19 



6.1. Fast Chinese remaindering in linear space. Standard algorithms for fast 
Chinese remaindering can be found in [70, §10.3]. We apply similar techniques, but 
use a time/space trade-off to achieve the space bound in (16). These computations 
involve a product tree built from coprime moduli. In our setting these are the primes 
Pi € S, which we index here as po, ■ ■ ■ ,p n -i- 

We define a product tree as a leveled binary tree in which each vertex at level fc is 
either a leaf or the product of its two children at level fc+1 (we require levels to have 
an even number of vertices and add a leaf to levels that need one) . It is convenient 
to label the vertices by bit-strings of length fc, where the root at level is labeled 
by the empty string and all other vertices are uniquely labeled by appending the 
string "0" or "1" to the label of their parent. 

Let d — [lg(n — 1)J + 1 be the number of bits in the positive integer n—1. For 
integers i from to n — 1, we let b(i) € {0, l} d denote the bit-string corresponding 
to the binary representation of i. The products m x are defined by placing the 
moduli in leaves as m b ^ = p il setting m x = 1 for all other leaves, and defining 
irix — rrixomxi for all internal vertices. 

The modular complements rn x — m/m x mod m x are then obtained by setting 
too = toi mod too and mi = too mod mi, and defining 

m z o = m x m x i mod to^o and m x i = m x m x o mod m x \. 

In terms of Mj = M/pi, we then have to = M and mf,u) — Mi mod pi. 

Let Ik denote the labels at level fc, for 1 < fc < d (and otherwise Ik is empty). 
One way to compute rrid is as follows: 

1. For fc from d to 1, compute m x for x e Ik- 

2. For fc from 1 to d, compute ra x for x € Ik- 

This uses 0(M(logM) logn) and O(logMlogn) space. Alternatively: 

1. For fc from 1 to d: 

2. For j from d to fc, compute m x for x e Ij (discard m y for y e Ij+i)- 

3. Compute m x for x e Ik (discard m y for y e Ik and m z for z e Ik-i)- 

This uses 0(M(log M) log 2 n) time and 0(log M) space. In general, storing [log" n] 
levels uses 0(M(logM)log 2 ""n) time and 0(logMlog w n) space, for < cu < 1. 

6.2. Applying the explicit CRT when P is small. Assume logP < /xlog 3 
We index the set S c "Pd as 5 = {po, ■ ■ ■ ,Pn-i} and let M — Y\pi and Mi = M/pi. 
As above, we define products m x and modular complements m x — m/m x mod m x , 
and similarly define modular complements m' x = m/m x mod P. 

Algorithm 2.3 (precompute). Given S = {po, ■ ■ ■ ,p n -i} a nd P: 

1. Compute m x and m' x . Save M mod P. 

2. Use TOh(i) = Mi mod pi to set A'/^ 1 mod p,. 

3. Use rn' b ,^ = Mi mod P to set rfj a^M^ mod P. 

4. Set C 3 <- and Sj <- for j from to /i(£>). 

Using the time/space trade-off described above, Algorithm 2.3 has a running time 
of (9(M(logM) log 2 n), using 0(logM + nlogP) space. 

We now set 5 = flgn] +2, which determines the precision of the integer Sj ~ 2 s r 
we use to approximate the rational number r in (6). 
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Algorithm 2.4 (update). Given Hd modp, with coefficients cy. 

1. For j from to h(D): 

2. Set Cj <- Cj + Cjdi mod P. 

3. Set Sj <— Sj + [2 s Cjai/pi\ . 

The total running time of Algorithm 2.4 over all Pi £ S may be bounded by 

(18) 0(nh(D)M(log P) + h(D)M{\og M + n log n)) . 

Typically the first term dominates, and it is here that we need logP = 0(log 3 \D\). 
The space complexity is 0(h(D)(logP + logp M + logn)). 

Algorithm 2.5 (postcompute). After computing Hd mod pi for all pi G S: 

1. For j from to h(D): 

2. Set Cj <r- Cj - L3/4 + 2~ 5 Sj\M mod P. 

3. Output Hd mod P with coefficients Cj. 

Algorithm 2.5 uses 0(h(D)M(\ogP)) time and 0(h(D) log P) space. The formulas 
used by Algorithms 2.4 and 2.5 are taken from [8, Thm. 2.2] (also see [7]). 

6.3. Applying the CRT when P is large. When P is larger than M, we simply 
compute Hd G Z[A] using a standard application of the CRT. That is, we compute 
Hd mod pi for pi G S, and then apply 

(5) c = V] CifliMi mod M 

to compute each coefficient of P D using fast Chinese remaindering [70, §10.3]. 
Since M > 2P, this determines Hd G Z[X]. Its coefficients lie in the interval 
(— P/2,P/2), so we regard this as effectively computing Hd mod P. The total 
time spent applying the CRT is then 0{h{D)M(\ogM) logn), and the space needed 
to compute (5) is O(logMlogn), which is easily smaller than the 0(h(D)logM) 
bound on the size of Hd (so no time/space trade-off is required). 

When P is smaller than M but logP > "log 3 \D\, we combine the two CRT 
approaches. We group the primes p , . . . ,p n -i into products q , . . . , qk-i so that 
log qj « logP (or qj > logP is prime). We compute Hd mod qj by applying the 
usual CRT to the coefficients of Hd modp,, after processing all the pi dividing qj. 
If qj is prime no work is involved, and otherwise this takes 0(M(logP) logn) time 
per coefficient. We then apply the explicit CRT to the coefficients of Hd mod qj, 
as in Section 6.2, discarding the coefficients of Hd mod qj after they have been 
processed by Algorithm 2.4. This hybrid approach has a time complexity of 

(19) 0(h(D) (log Mj log P) M (log P) log n) = 0(h(D) M (log M) logn), 
and uses 0(h(D)(\ogP + logp M )) space. 

7. Complexity Analysis 

We now analyze the complexity of Algorithms 1 and 2, proving Theorem 1 
through a series of lemmas. To do so, we apply various number-theoretic bounds 
that depend on some instance of the extended or generalized Riemann hypothesis. 
We use the generic label "GRH" to identify all statements that depend (directly or 
indirectly) on one or more of these hypotheses. As noted in the introduction, the 
GRH is used only to obtain complexity bounds, the outputs of Algorithms 1 and 2 
are unconditionally correct. 
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Let M(n) denote the cost of multiplication, as denned in [70, Ch. 8]. We have 
(20) M(n) = O(nlognllogn), 

by [57], where llog(n) denotes log log n (and we use lllog(n) to denote log log log n). 
We focus here on asymptotic results and apply (20) throughout, noting that the 
larger computations in Section 8 make extensive use of algorithms that realize this 
bound. See Section 7.1 for a practical discussion of M(n). 

Let us recall some key parameters. For a discriminant D < —4, we define 

(2) Vd = {p > 3 prime : 4p = t 2 - v 2 D for some t, v £ Z >0 }, 

where t — t(p) and v = v(p) are uniquely determined by p. We select a subset 

SCS z = {peV D : p/H(-v(p) 2 D) < z}, 

that satisfies Ilpes? 3 > ^-Bj wncre B bounds the absolute values of the coefficients 
of H D . We also utilize prime norms £i, . . . ,4 arising in a polycyclic presentation 
of cl(D) that is derived from a set of generators. 

(GRH) For convenient reference, we note the following bounds: 

(i) h = h(D) = 0(\D\ 1 ' 2 Uog ( see [52]). 

(ii) 6 = lgB + 2 = 0(|L>| 1 /2i g|D|Uog|D|) (Lemma 8). 

(iii) n = #S = Od^l 1 / 2 Uog \D\) (follows from (ii)). 

(iv) £ M = max{4, . . . ,4} = 0(log 2 \D\) (see [4]). 

(v) z = OdDl 1 / 2 log 3 \D\ Uog \D\) (Lemma 2). 

(vi) p M = max5 = 0(\D\ log 6 \D\ Uog 8 \D\) (Lemma 3). 

(vii) v M = max{v(p) : p e S} = 0(\og 3 \D\ Uog 4 |D|) (Lemma 3). 

The first three parameters have unconditional bounds that are only slightly larger 
(see [5, §5.1]), but the last four depend critically on either the ERH or GRH. 
Heuristic bounds are discussed in Section 7.1. 

To prove (v) we use an effective form of the Chebotarev density theorem [48] . 
Recall that Vd is the set of primes (greater than 3) that split completely in the ring 
class field Kq of O. For a positive real number x, let 7Ti(x, Kq/Q) count the primes 
p < x that split completely in Kq. Equivalently, iri(x, Kq/Q) counts primes whose 
image in Gal(Ko/Q) under the Artin map is the identity element [23, Cor. 5.21]. 
Applying Theorem 1.1 of [48] then yields 



(21) 



ni{x,K /Q) L,( ' r) 



/ x 1 / 2 loe(\D\ h( - D '>x 2h ( D ')) 



2h(D) 

as in [5, Eq. 3], where the constant ci is effectively computable. 

Lemma 2 (GRH). For any real constant c 3 there is an effectively computable 
constant c 2 such that z > c 2 /i(£>)log 3 \D\ implies #S Z > czh(D) log 3 \D\. 

Proof. Let h = h(D). We apply (21) to x = c h 2 log 4 |D|, with cq to be determined. 
We assume D < — 4 and logco > 2, which implies log a; < 41ogco log |D| (using 
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h < \D\ and log|D| < |£>| 1/2 ), and Li(x) > x/\ogx, for all x > 1. Negating the 
expression within the absolute value, we obtain from (21) the inequality 

Mx,Ko/Q) > (t^ 2 5 Cl V^logc ) fclog 3 |£>|. 

V 8 log c J 

Thus given any constant c 4 we may effectively determine Co > e 2 (using ci) so that 

in(x,Ko/Q) > c 4 h\og 3 \D\. 

For the set R x of primes in Vd bounded by x, we have #R X — ni{x, Kq/Q) — 2. 

Let v n be the least integer such that at least half the primes in R x have v(jp) < v . 
There are vq positive integers less than or equal to vq, and any particular value 
v(p) < v can arise for at most 2^/x primes p e R x , since t(p) < 2^/p < 2y/x. 
Therefore 2vo^/x > #R x /2, and this implies 

2v 0y ^h\og 2 \D\ > (c 4 /ilog 3 \D\ - 2)/2 > (c 4 /2 - l)Mog 3 \D\. 

We thus obtain v > c$ log \D\, where C5 = (c 4 /2 — l)/\/4co, and assume c 4 > 2. 
For primes p £ R x with w(p) > w 0l the lower bound in Lemma 9 implies 

H(-v(pYD) - v(p)H(-D) ~ c 5 h\og\D\ = ( c o/c 5 )Mog 3 |^|. 

If z > c 2 /ilog 3 |£)|, with c 2 = C0/C5, then S z contains at least half the primes in R x . 
Setting c 4 = max{2c3 + 2, 3} determines Co, C5, and C2, and completes the proof. □ 

The primes p e 5 Z are enumerated by Algorithm 2.1 (Section 3.3), which grad- 
ually increases z until X) P es 2 > where b — IgB + 2. 

Lemma 3 (GRH). When Algorithm 2.1 terminates, for every prime p € S z we 
have the bounds p = 0(\D\ log 6 \D\ Hog 8 \D\) and v(p) = 0(log 3 \D\ Hog 4 \D\). 

Proof. Let D = u 2 Dk, where u is the conductor of D. The upper bound in 
Lemma 9, together with the bound (i) on h(D), implies that for a suitable constant 
C2 and sufficiently large \D\, the bound 

H{-v 2 D) < 12uvH(-D K )\\og 2 {uv + A)) < c 2 v\D\ 1/2 llog \D\ Hog 2 (i; | Z? | ) 

holds for all positive integers v. 

Lemma 2, together with bounds (i) and (ii), implies that Algorithm 2.1 achieves 
Y^pesJgP > 2b with z = 0(h(D) log 3 \D\) = O^D] 1 / 2 log 3 \D\ llog \D\). Thus for 
a suitable constant C3 and sufficiently large the bound 

(22) p < zH(-v(p) 2 D) < c 3 v(p)\D\ log 3 \D\ llog 2 \D\ Wog 2 {v{p)\D\) 

holds for all p € S z . We also have v(p) < 2^p/\D\, since 4p = t(p) 2 — v{p) 2 D. 
Applying this inequality to (22) yields p = 0(\D\ log 6 \D\ llog 8 \D\), which then 
implies v = C»(log 3 \D\ llog 4 \D\). □ 

We could obtain tighter bounds on p M and v M by modifying Algorithm 2.1 to 
only consider primes in R X I~)S Z , but there is no reason to do so. Larger primes will 
be selected for S only when they improve the performance. 

To achieve the space bound of Theorem 1, we assume a time/space trade-off is 
made in the implementation of Algorithm 2.1. We control the space used to find 
the primes in S z , by sieving within a suitably narrow window. This increases the 
running time by a negligible poly-logarithmic factor. 



COMPUTING HILBERT CLASS POLYNOMIALS WITH THE CRT 



23 



Lemma 4 (GRH). The expected running time of Algorithm 2.1 is 0(\D\^ 2+e ), 
using OQD] 1 ^ 2 log \D\ Hog \D\) space. 

Proof. When computing S z , it sufBces to consider v up to an 0(log 3+c \D\) bound, 
by Lemma 3 above. For each v we sieve the polynomial f{t) = t 2 — v 2 D to find 
f(t) = Ap with p prime. The bound on p implies that we need only sieve to an 
L = OdDl 1 / 2 log 3+£ \D\) bound on t. Wc may enumerate the primes up to L in 
O(LllogL) time using 0(VLlogL) = (9(|L'| 1 / 4+e ) space (we sieve with primes up 
to \f~L to identify primes up to L using a window of size VZ). 

For each of the n(L) primes £ < L, we compute a square root of —v 2 D modulo I 
probabilistically, in expected time 0(M(log^) log^), and use it to sieve fit). Here 
we sieve using a window of size 0(\D\ X / 2 log \D\ llog \D\), recomputing each square 
root 0(log 2+e \D\) times in order to achieve the space bound. 

For each w, the total cost of computing square roots is 0(tt(L) log 4+c which 
dominates the cost of sieving. Applying n(L) = 0(L/ log L) and summing over v 
yields <3(|L>| 1/2 log 9+e \D\), which dominates the time to select S C S z . 

To stay within the space bound, if we find that increasing z in Step 3 by a factor 
of 1 + 5 causes S z to be too large (say, greater than 4b bits), we backtrack and 
instead increase z by a factor of 1 + 5/2 and set 5 5/2. We increase z a total of 
0(log \D\) times (including all backtracking), and the lemma follows. □ 

In practice we don't actually need to make the time/space tradeoff described in 
the proof above. Heuristically we expect p M = 0(\D\ log 1+e \D\), and in this case 
all the primes in S z can be found in a single pass with L = O^D] 1 / 2 log 1 ^ 2+e \D\). 

We now show that all the precomputation steps in Algorithm 2 take negligible 
time and achieve the desired space bound. This includes selecting primes (Algo- 
rithm 2.1 in Section 3.3), computing polycyclic presentations (Algorithm 2.2 in 
Section 5.1), and CRT precomputation (Algorithm 2.3 in Section 6.2). 

Lemma 5 (GRH). Steps 1, 2, and 3 of Algorithm 2 take 0(|D| 1 / 2+£ ) expected 
time and use OQD} 1 / 2 (log \D\ + log P) llog \D\) space. 

Proof. The complexity of Step 1 is addressed by Lemma 4 above. By Proposition 6, 
Step 2 performs h(D) operations in cl(D), each taking 0(log 2 \D\) time [9]. Even if 
we compute a different presentation for every v < v M , the total time is 0(\D\ 1 / 2+e ). 
The table used by Algorithm 2.2 stores h(D) — 0(\D\ X / 2 llog \D\) group elements, 
by bound (i), requiring O^D^/ 2 log \D\ llog \D\) space. 

As described in Section 6.2, when logP < ^log 3 |D| the complexity of Algo- 
rithm 2.3 is 0(M(log M) log 2 n) time and (9(logM + nlogP) space, and we have 

logM = J2 l ogp < nlogp M = 0(\D\ 1/2 log \D\ llog |D|), 

pes 

according to bounds (hi) and (vi) above. As discussed in Section 6.3, the same time 
and space bounds for precomputation apply when logP > /ilog \D\. □ 

We next consider TestCurveOrder (Section 3.4), which is used by Algo- 
rithm 1.1 to find a curve in Ell t (F p ). We assume [64, Alg. 7.4] is used to implement 
the algorithm FastOrder which is called by TestCurveOrder. 

Lemma 6. TestCurveOrder runs in expected time 0(log 2 pllog 2 p). 
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Proof. For s = 0, 1 the integer m s computed by TestCurveOrder is the 1cm of 
the orders of random points in E s (¥ p ). By [64, Thm. 8.1] we expect 0(1) points 
yield m s — X(E s (¥ p )), the group exponent of E s (¥ p ). For p > 11, Theorem 2 and 
Table 1 of [25] then imply J\f C {7V ,iVi}, forcing termination. We thus expect to 
execute each step O(l) times. We now bound the cost of Steps 2-5: 

2. The non- residue used to compute E can be probabilistically obtained using 
an expected Oilogp) operations in F p , via Euler's criterion. 

3. With E s in the form y 2 = /(x), we obtain a random point (x, y) by comput- 
ing the square-root of f(x) for random x G F p , using an expected O(logp) 
operations in F p to compute square roots (probabilistically). 

4. Computing Q = m s P uses O(logp) group operations in E s (¥ p ). The fac- 
torization of N s /m s is obtained by maintaining m s in factored form. Im- 
plementing FastOrder via [64, Alg. 7.4] uses 0(logpllogp/ lllogp) group 
operations on E s (¥ p ), by [64, Prop. 7.3]. 

5. The intersection of two arithmetic sequences can computed with the ex- 
tended Euclidean algorithm in time 0(log 2 p), by [70, Thm. 3.13]. 

Step 4 dominates. The group operation in E s (¥ p ) uses O(l) operations in F p , each 
with bit complexity 0(M(logp)), and this yields the bound of the lemma. □ 

We are now ready to bound the complexity of Algorithm 1 (Section 2), which 
computes Hrj mod p using Algorithm 1.1 (Section 3.4), Algorithm 1.2 (Section 4.1), 
and Algorithm 1.3 (Section 4.2). 

Lemma 7 (GRH). For p € S, Algorithm 1 computes Hr> modp with an expected 
running time of O^D] 1 / 2 log 5 |D| Hog 3 \D\), using O^D] 1 / 2 log Hog |D|) space. 

Proof. Ignoring the benefit of any torsion constraints, Algorithm 1.1 expects to 
sample p/ H(—v 2 D) < z random curves over F p to find j £ Ell t (F p ). The cost of 
testing a curve is 0(log 2 pllog 2 p), by Lemma 6, and this bound dominates the cost 
of any filters applied prior to calling TestCurveOrder. 

Applying bound (v) on z and bound (vi) on p M yields an overall bound of 

(23) Od^l 1 / 2 log 5 |,D|llog 3 |Z^|) 

on the expected running time of Algorithm 1.1, and it uses negligible space. 

Algorithm 1.2 finds j € Elle>(F p ) in polynomial time if the conductor of D 
is small, and otherwise its complexity is bounded by the 0(p 1 ^ 3 ) = 0(|D| 1 / 3+e ) 
complexity of Kohel's algorithm (under GRH). In either case it is negligible. 

As shown in [4], the ERH yields an 0(log 2 \D\) bound on the prime norms needed 
to generate cl(D), even if we exclude norms dividing v (at most 0(llog \D\) primes). 
It follows that every optimal polycyclic presentation used by Algorithm 1.2 has 
norms bounded by £ M = 0(log 2 |-D|). To bound the running time of Algorithm 1.3 
we assume £i \ i>, since we use li\v only when it improves performance. 

The time to precompute each is 0(£ 3 i +c ) = (9(log 6+£ \D\), by [28], and at 
most 0(log \D\) are needed. These costs are negligible relative to the desired bound, 
as is the cost of reducing each $^ modulo p. Applying the bound on £ M and 
bound (vi) on p M , each step taken by Algorithm 1.3 on an fi-isogeny cycle uses 
0(log 4 |D|) operations in F p , by (13). A total of h steps are required, and the 
bounds (i) on h and (vi) on p yield a bit complexity of O^D] 1 ' 2 log 5 \D\ llog 2+£ |D|) 
for Algorithm 1.3, using 0{h\gp) = 0(|£>| 1/2 log|L»|llog|D|) space. 
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Step 4 of Algorithm 1 computes Y[(X — j) over j € Elle>(F p ) via a product tree, 
using 0(M(h) log h) operations in F p and space for two levels of the tree. Applying 
bound (i), this uses 0(|£>| log 3+e \D\) time and O^D] 1 ' 2 log \D\ Hog \D) space. □ 

The time bound in Lemma 6 may be improved to O^D^' 2 log 5 |D| Hog 2 |D|) by 
arguing that a random point on a random elliptic curve over ¥ p has order greater 
than 4^/p with probability 1 — 0(l/logp). 

Theorem 1 (GRH). Algorithm 2 computes H D mod P m 0(\D\ log 5 \D\ Hog 4 \D\) 
expected time, using O^D} 1 / 2 (log \D\ + logP) Hog \D\) space. 

Proof. Lemma 5 bounds the cost of Steps 1-3. As previously noted, if we have 
P > M = YlpesP' we sct P = M and compute Ho over Z. 

Algorithm 1 is called for each p G S, of which there are n = OQD} 1 / 2 Hog \D\), 
by bound (iii). Applying Lemma 7, Algorithm 1 computes Hjj mod p for all p e S 
within the time and space bounds stated in the theorem. 

Recalling (18) from Section 6.2, for logP < /ilog 3 \D\ the total cost of updating 
the CRT sums via Algorithm 2.4 is bounded by 

(24) O (nhM (log P) + /iM (log M + n log nj) . 

We have logM < nlogp M = O^D^/ 2 log \D\ Hog \D\), by bounds (iii) and (vi), 
thus (24) is bounded by 0(\D\ log 3+£ \D\), using bound (i) on h. The cost of Algo- 
rithm 2.5 in Step 5 is 0(hU(\ogP)) = 0(| J D| 1 / 2+e ), with logP = (9(log 3 \D\). The 
space required is 0(h(log \D\ + logP)), which matches the bound in the theorem. 

For logP > ^ilog 3 \D\, we apply the hybrid approach of Section 6.3, whose costs 
are bounded in (19). Using the bounds on logM, n, and h, we again obtain an 
0(|£)| log 3+e \D\) time for all CRT computations, and the space is as above. □ 

The CRT approach is particularly well suited to a distributed implementation; 
one simply partitions the primes in S among the available processors. The precom- 
putation steps in Algorithm 2 have complexity 0(|£>| 1/,2+e ), under the GRH, and 
this is comparable to the complexity of Algorithm 1. Parallelism can be applied 
here, but in practice we are happy to repeat the precomputation on each processor. 

When logP is polynomially bounded in log|D|, the postcomputation can be 
performed in time 0(|Z?| 1 / 2+£ ) by aggregating the CRT sums, with the final result 
Hd mod P available on a single node. When P is larger, as when computing Hp 
over Z, we may instead have each processor handle the postcomputation for a subset 
of the coefficients of Hp, leaving the final result distributed among the processors. 

We do not attempt a detailed analysis of the parallel complexity here, but note 
the following corollary, which follows from the discussion above. 

Corollary 1 (GRH). There is a parallel algorithm to compute Hjj mod P on 
0(|P| 1 / 2+£ ) processors that uses 0(|D| 1 / 2+£ ) time and space per processor. 

7.1. A heuristic analysis. To obtain complexity estimates that better predict 
the actual performance of Algorithms 1 and 2, we consider a naive probabilistic 
model. We assume that each positive integer m is prime with probability 1/ log to, 
and that for each prime i \ D we have (-j) = 1 with probability 1/2. For a 
prime £ with (-j) = 1 we further assume that if a^aT 1 e cl(D) are distinct classes 
containing an ideal of norm £, then a corresponds to a random element of cl(Z?) 
uniformly distributed among the elements of order greater than 2. Most critically, 
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we suppose that all these probabilities are independent. This last assumption is 
obviously false, but when applied on a large scale this model yields empirically 
accurate predictions. 

Compared to the GRH-based analysis, these assumptions do not change the 
space complexity, nor bounds (i)-(iii), but significantly improve bounds (iv)-(vii). 

(H) Our heuristic model predicts the following: 

(iv) £ M = 0(log 1+e |D|). 

(v) z = 0(\D\ 1 / 2 log 1/2+e \D\). 

(vi) p M = 0(|^|log 1+e |C|). 

(vii) v M = 0(log 1 ' 2+e \D\). 

Applying these to the analysis of Section 7 yields an 0(|-D| log 3+£ \D\) bound on 
the expected running time of Algorithm 2, matching the heuristic result in [5]. 

It is claimed in [5, §5.4] that applying the bounds (i) and (ii) to [27, Thm. 1.1] 
also yields a heuristic complexity of 0(|D| log 3+e \D\) when using the floating- 
point method to compute Hp. This is incorrect, the implied bound is actually 
0(|£>| log 4+e \D\) (as confirmed by the author of [27]). 

One may reasonably question how accurate our 0(|L>|log 3+e |D|) estimate is m 
practice, since it assumes the Fast Fourier Transform (FFT) is used for all multi- 
plications. The cost M(n) arises in three distinct contexts: 

(a) The cost of operations in F p is bounded by 0(M(logp)). 

(b) Finding a root of <&t(X, ji)/(X — ji-i) uses 0(M(£) logp) F p -operations. 

(c) Computing Yl(X — j) uses 0(M(h) log h) F p -operations. 

In case (a) we actually expect lgp M to be smaller than the word size of our CPU, so 
multiplications in F p effectively have unit cost. For (b), I is typically in the range 
where either schoolbook or Karatsuba-based multiplication should be used. It is 
only in case (c) that FFT-based algorithms may be profitably applied. 

In order to better estimate the running time of Algorithm 1 (which effectively 
determines the running time of Algorithm 2) we break out the cost of each step, 
expressing all bounds in terms of F p -operations. 

Step Complexity (F p -operations) 

1. Find j i EU t (Fp) 0(\D\^ 2 log 3/2+£ \D\) 

2. Find j' e Ell (Fp) negligible 

3. Enumerate E11 (F P ) 0(\D\^ 2 log 1+u+£ \D\) 

4. Compute n je EU (F p )(^ ~ j) Q(\D\ 1/2 \D\) 

Table 1. (H) Heuristic complexity of Algorithm 1. 

The value of oj depends on our estimate for M(£). One can find values of D in 
the feasible range where t m is over 300, see [40, 41], and here it is reasonable to 
assume M(^) = f with uj — lg 3 « 1.585. In the worst case, Step 3 dominates. 

However, the critical parameter is £i, the least cost ii used by Algorithm 1.3. If 
t\ \ D we expect it to be used in the overwhelming majority of the steps taken by 
Algorithm 1.3. As with £ M , it is possible to find feasible D for which l\ is fairly large 
(over 100), but such cases are extremely rare. If we average over D in some large 
interval, our heuristic model predicts i\ — 0(1) (in fact E[£i] < 4). We typically 
have M(£i) = 0(1) and use uj = 0. In almost all cases, Step 4 dominates. 
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The relative cost of Step 4 is not significant for small \D\, due to the excellent 
constant factors in the algorithms available for polynomial multiplication, but its 
asymptotic behavior becomes evident as \D\ grows (see Tables 3 and 4). 



To assess the performance of the new algorithm in a practical application, we 
used it to construct pairing-friendly curves suitable for cryptographic use, a task 
that often requires large discriminants. We constructed ordinary elliptic curves of 
prime order and embedding degree k over a prime field F g such that either 

k = 6 and 170 < lg q < 192, or k = 10 and 220 < lg q < 256. 

These parameters were chosen using the guidelines in [31], and have particularly 
desirable performance and security characteristics. For additional background on 
pairing-based cryptography we refer to [20, Ch. 24]. 

To obtain suitable discriminants we used algorithms in [44] (for k = 6) and [30] 
(for k = 10) that were optimized to search for q within a specified range. This 
produced a set T> PF of nearly 2000 fundamental discriminants (1722 with k = 6 
and 254 with k = 10), with \D\ ranging from about 10 7 to just over 10 13 (almost 
all greater than 10 10 ). We selected 200 representative discriminants from T> PF for 
our tests, including those that potentially posed the greatest difficulty, due to an 
unusually large value of l\ or h(D). 

To each selected discriminant we applied the CM method, using Algorithm 2 to 
compute Ho mod P (with P = q). After finding a root j of Hu{X) over ¥ q , we 
construct an elliptic curve E with this j-invariant and ensure that the trace of E 
has the correct sign. 1 

8.1. Implementation. The algorithms described in this paper were implemented 
using the GNU C/C++ compiler [63] and the CMP library [33] on a 64-bit Linux 
platform. Multiplication of large polynomials was handled by the zn.poly library 
developed by Harvey [35] , based on the algorithm in [36] . 

The hardware platform included sixteen 2.8 GHz AMD Athlon processors, each 
with two cores. Up to 32 cores were used in each test (with essentially linear 
speedup), but for consistency we report total cpu times, not elapsed times. Memory 
utilization figures are per core, and can be achieved using a single core. 

8.2. Distribution of test discriminants. To construct a curve of odd order over 
a field of odd characteristic we must have D = 5 mod 8, and this necessarily applies 
to D e T> PF . We then have (-j) = —1, which implies i\ > 3, and also tends to 
make h(D) smaller than it would be for an arbitrary discriminant. Averaging over 
all discriminants up to an asymptotically large bound, we expect 



where C = U p (l - l/(p 2 (p + 1))), see [18, p. 296] (and see [40] for actual data). 
Among the 1722 discriminants we found for k = 6, the average value of L(1,xd) 
is about 0.55, close to the typical value for D = 5 mod 8. For k — 10 we have the 
further constraint t\ > 7, and the average value of L(1,xd) is approximately 0.40. 

^^One may apply the method of [56], or simply compute NQ for a nonzero point Q g E(¥ q ), 
where N is the desired (prime) order of E(¥ q ), and switch to a quadratic twist of E if NQ 7^ 0. 
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Example 1 


Example 2 


Example 3 


1 Dl 


19 ^rq s^n on'? 

Id, tJUy, 0<JU, UUO 


1 1 rno ^87 


19 QD1 800 

1^, jUlj OUU, OOtJ 






1 1 98fl 




M 1 ) XD) 


n ^4 


n 94 


1 ^,1 





9 979 ^fifi 


1 QKQ 1 

i,ooy,ioo 


c; /icq 770 


H 


Uo,OOZ 




1 49 874 


z 


755,637 


734,040 


905,892 




40 


38 


43 


V m 


12 


8 


32 




^720203^ 


(17 1128 ,19 10 ) 


(3 27038 ,5 2 ) 


Step 1 


0.0s 


0.0s 


0.0s 


Step 2 


1.2s 


0.5s 


4.0s 


Step 3 


0.6s 


0.3s 


2.0s 


Step 4 


23,300s 


26,000s 


61,000s 


Step 5 


0.0s 


0.0s 


0.0s 


(T ( ,T e ,T h ) 


(57,32,11) 


(51,47,2) 


(53,20,27) 


throughput 


2.0Mb/s 


0.6Mb/s 


4.9Mb/s 


memory 


3.9MB 


2.1MB 


9.4MB 


total data 


5.7GB 


1.9GB 


37GB 


Solve H D (X) = over ¥ q 


127s 


86s 


332s 



Table 2. Example computations. 
(2.8 GHz AMD Athlon) 



While we regard the discriminants in 2? PF as representative for the application 
considered, in order to assess the performance of Algorithm 2 in more extreme 
cases we also conducted tests using discriminants with very large values of L(l, xd)- 
These results are presented in Section 8.5. 

8.3. Examples. Table 2 summarizes computations for three discriminants of com- 
parable size, with \D\ s=y 10 10 . These represent a typical case (Example 1) and two 
"worst" cases (Examples 2 and 3). The parameters appearing in the top section of 
the table are as defined in Section 7. The next section of the table contains timings 
for each step of Algorithm 2. 

As predicted by the asymptotic analysis, essentially all of the time is spent in 
Step 4, which calls Algorithm 1 for each prime p E S. There are three principal 
components in the running time of Algorithm 1: 

Tf. time spent in Step 1 finding a curve in Ell t (F p ); 

T c : time spent in Step 3 enumerating Elle)(F p ); 

T b : time spent in Step 4 building H D (X) = Ilj e E,ii (¥ p )( X - •?') mod P- 
These are listed in Table 2 as percentages of the total time T. The time spent 
elsewhere (T - T f -T e - T h ) is well under 1% of T. 

The third section in Table 2 lists the throughput, memory utilization, and total 
data processed during the computation. 2 The total data is defined as the product 
of the number of coefficients h(D) and the height bound b. This approximates the 

2 The suffixes Mb, MB, and GB indicate 10 6 bits, 10 6 bytes and 10 9 bytes, respectively. 
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\D\ 


h(D) 


cpu sees 


(T f ,T e ,T b ) 


Mb/s 


memory 


data 


116,799,691 


2, 112 


156 


(65,28,7) 


2.6 


0.5MB 


52MB 


1,218,951,379 


6,320 


1,650 


(64,26,8) 


2.5 


1.1MB 


520MB 


13,569,850,003 


20,203 


23,400 


(57,33,10) 


2.0 


3.9MB 


5.7GB 


126,930,891,691 


56,282 


195,000 


(66,22,12) 


2.0 


9.5MB 


50GB 


1,009,088,517,019 


181,584 


2,160,000 


(64,20,16) 


2.0 


34MB 


535GB 


10,028,144,961,139 


521,304 


20,600,000 


(63,20,17) 


1.9 


84MB 


5.0TB 



Table 3. Performance for typical D E V, 
(2.8 GHz AMD Athlon) 



total size of Hp, typically overestimating it by about 10% (the actual sizes of Hd 
for the three examples are 5.3GB, 1.8GB and 34GB respectively). The throughput 
is then the total data divided by the total time. Memory figures include all working 
storage and overhead due to data alignment (to word boundaries and to powers of 2 
in FFT computations), but exclude fixed operating system overhead of about 4MB. 
The last row of the table lists the time to find a root of Hd over ¥ q , although this 
task is not actually performed by Algorithm 2. 

Example 1 represents a typical case: L(1,xd) is close to the mean of 0.55, and 
l\ = 7 is just above the median of 5 (over D € T> PP ). Example 2 has an unusually 
large l\ = 17 (exceeded by fewer than 1% of D £ T> PF ), while Example 3 has an 
unusually large L(l, xd) ~ 1.51 (exceeded by fewer than 1% of D £ T> PF ). 

In Example 2, the large t\ increases T e substantially, despite the smaller h(D). 
The smaller L(l, xd) tends to increase the running time of individual calls to Al- 
gorithm 1.1, but at the same time n decreases so that overall Tf decreases slightly. 
The smaller values of h(D) and n both serve to decrease Tb significantly. 

In Example 3 the large L(1,xd) decreases the cost of individual calls to Algo- 
rithm 1.1, but increases n substantially so that overall Tf increases. However, T 
and Tb increase even more, especially Tb. Despite the longer running time, this 
scenario results in the highest throughput of the three examples. 

8.4. Scaling. Table 3 summarizes the performance of Algorithm 2 for D £ T> PF 
ranging over six orders of magnitude. We selected examples whose performance 
was near the median value for discriminants of comparable size. We note the quasi- 
linear growth of T, and the increasing value Tb as a percentage of T, consistent 
with our heuristic prediction that this component is asymptotically dominant. 

Up to 32 cores were applied to the computations in Table 3. In all but the 
smallest example we can effectively achieve a 32x speedup. The actual elapsed 
time for the largest discriminant was about 8 days, while the second largest took 
less than a day. As suggested by Corollary 1 , these computations could be usefully 
distributed across many more processors. The low memory requirements provide 
headroom for much larger computations: each of our cores had 2GB of memory, 
but less than 100MB was used. 

Below is an example of a curve constructed using D = —10,028, 144,961, 139, 
the largest discriminant listed in Table 3. The elliptic curve 

y 2 = x 3 - 3x + 3338561401570133202017008597803337396411439360229378547 
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I11L.H1U1 y 


U CI L CI 


2,093,236,031 


100,000 


98,800 


(25,19,56) 


7.5 


18MB 


93GB 


8,364,609,959 


200,000 


472,000 


(24,17,59) 


6.8 


36MB 


400GB 


17,131,564,271 


300,000 


1,240,000 


(20,15,65) 


5.9 


61MB 


920GB 


30,541,342,079 


400,000 


2,090,000 


(21,16,63) 


6.4 


71MB 


1.7TB 


42,905,564,831 


500,000 


3,050,000 


(22,17,61) 


6.9 


81MB 


2.6TB 


67,034,296,559 


600,000 


5,630,000 


(18,14,68) 


5.6 


121MB 


3.9TB 


82,961,887,511 


700,000 


7,180,000 


(19,14,67) 


5.9 


132MB 


5.3TB 


113,625,590,399 


800,000 


9,520,000 


(19,15,66) 


5.9 


142MB 


7.1TB 


133,465,791,359 


900,000 


11,500,000 


(20,15,65) 


6.2 


152MB 


9.0TB 


170,868,609,071 


1,000,000 


14,200,000 


(20,16,64) 


6.3 


163MB 


11.2TB 



Table 4. Performance when L(1,\d) is large. 

(2.8 GHz AMD Athlon) 



has embedding degree 6 over the finite field F 9 with 

q = 30518311673028635209000068713843412774183984182022701057. 

This curve has prime order TV = q + 1 — t, where 

t = 5524338120809463560527395583. 

There are a total of h(D) = 521,304 nonisomorphic curves with the same order 
that may be constructed using Hp mod q. A complete list of curves for all the 
discriminants tested is available at http://math.mit.edu/~drew. 

8.5. Discriminants with large L(l, Xd)- Table 4 shows the performance of Al- 
gorithm 2 on discriminants specifically chosen to make L(1,xd) extremely large, 
between 6.8 and 7.8. These discriminants are not in 2? PF , and are likely the small- 
est possible for the class numbers listed (but we do not guarantee this). In each 
case we computed Hp modulo a 256-bit prime P. The timings would not change 
significantly for larger P, but the space would increase. 

The first discriminant D = — 2, 093, 236, 031 in Table 4 also appears in Table 1 of 
[27]. Scaled to the same processor speed, Algorithm 2 computes Hp mod P using 
less than half the cpu time spent by the floating-point approximation method to 
compute a class polynomial over Z for the same D (this polynomial would then 
need to be reduced mod P in order to apply the CM method). Most significantly, 
the memory required is about 20 MB versus 5 GB. 

This comparison is remarkable, given that the height bound b — 7, 338, 789 for 
Hd is nearly 28 times larger than the 264,727 bits of precision used in [27], where 
the class polynomial for the double-eta quotient 033^3 was computed instead of the 
Hilbert class polynomial. The difference in throughput is thus much greater than 
the difference in running times: 7.5 Mb/s versus 0.10 Mb/s. 
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Appendix 1 

This appendix proves Lemma 8, which bounds the coefficients of the Hilbert 
class polynomial Hn(X), and Lemma 9 which bounds the Hurwitz class number 
H{—v 2 D) in terms of v and H(—D). 

Let B denote an upper bound on the absolute values of the coefficients of Hp (X). 
In the literature one finds many values for B (or log-B), but due to an unfortunate 
series of typographical errors, most are either incorrect [5, p. 285], exponentially 
larger than necessary ([2, Eq. 22] and [11, p. 151]), or heuristics that do hold for 
all D ([1, Eq. 3.1] and [12, p. 2431]). In [27, Thm. 1.2], Enge gives a rigorous and 
fully explicit value for B that is empirically accurate to within a constant factor, 
but still larger than desirable for practical application. Provided one is prepared 
to enumerate the elements of cl(D), a much tighter bound is given by the lemma 
below, whose proof is derived directly from Enge's analysis in [27, §4]. 

Lemma 8. For a quadratic discriminant D < 0, let (di, b\, Ci), . . . , (ah, bh, Ch) be 
the sequence of reduced, primitive binary quadratic forms of discriminant D with 
< a\ < ••• < ah, where h = h(D). Let M k = exp(7ry / |I?|/afe) + C, where 
C = 2114.567. Then the coefficients of Hd{X) have absolute values bounded by 



B =(I)vnM, 



h+l 



. We also have \ogB = O^D] 1 / 2 log 2 \D\), and under the GRH, 



where m - y-j^ 
logB = 0(\D\^ 2 log|D|llog|D|). 

Proof. We may write H D as 

h 

H D (x)=ii(x-j{T k )), 

k=0 

where tu = (— fefe + VD)/2ak- With q^ — e 27TlTk , we have Mk = |l/<Zfc| + C, where 
the constant C bounds |j(rfe) — l/<Zfc|, as shown in [27, p. 1094]. Thus |j( r fc)l ^ 
and the absolute value of the coefficient of X n in Hrj(X) is bounded by 

(25) B n = )l[M k . 

^ ' k=l 

We now argue that B n < B. For n > m we have n > (h + l)/ (Mh + 1) and 

'")/( h \J^±l<M b . 
K nJ I \n— I J n 

This implies B n < S„_i. For < n < m we have (£) / ( > Mh, which implies 
Bo < BiMh/Mh < B 2 M h - 1 M h /Ml <■■■< B m M h - m +i ■ ■ ■ M h /M h n = B. 

It follows that B bounds every B n . 

The bound logB = 0(|L>'| 1/2 log 2 \D\) follows from h = 0(|Z5| 1/2 log |i^|), as 
proven in [59], and the bound E fe ^" = °( lo g 2 l-°l)> 

proven in [58, Lemma 2.2]. 
As shown in [5, Lemma 2], under the GRH the bound J2k^ = 0(log \D \ Hog \D\) 
follows from [52], which yields \ogB = 0(\D\^ 2 log \D\ llog |D|). □ 
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In practice the bound given by Lemma 8 is close to, and often better than, 
the heuristic bound B = (^ 2 j) ex P( 7T \^\D\ J2k ^) tnat ^ s sometimes used, even 
though the latter bound is not actually valid for all D (such as D = —99). 

Lemma 9. Let D be a negative discriminant, let v > 2 be an integer, and let x 
be the largest prime for which Y\ p<x p < v, where p ranges over primes. Let H(n) 
denote the Hurwitz class number. The following bounds hold: 



Proof. Let u be the conductor of D, so that D = u 2 Dq. Then 
(26) H{ ^ D) = H{ . {uvfDc} = J: ?MM, 

d\uv 

where w(d 2 D ) = \0* 2D J is 2, 4, or 6 [18, Lemma 5.3.7]. We also have [18, p. 233] 
h(d 2 D ) _ MA)) .tt A X P " 



w(d 2 D ) w(D ) 11 V V 

p\d 



where %p — ("§r) ^ s — 1> 0> or 1- Regarding Z? as fixed, we note that 

W (A) ^ V P / 



d\n p\d 

is a multiplicative function of n, which yields 
2h(D ) 



p 



where v p (n) is the p-adic valuation. From (26) we obtain 

H(-v 2 D) = U P (1 + ( P ^ + ^ - 1) (p - X P )/(P - 1)) 
( j vH(-D) vU p (l+(p^ u) -l)(p-X P )/(p-l)) ' 

Fixing I? = u 2 Do, we regard (27) as a multiplicative function of v. For v = p k : 

H(-p 2 kD) = (l+( P ^ +t -l) (P-X P )/(P-1)) 
p k H(-D) pk (1 + ( p M«) - 1) (p - xp)/(p - 1)) ' 

This value is minimized when \p = 1) m which case it is 1, yielding the first 
inequality in the lemma. It is maximized when x P = — L in which case one finds 

(l + ( p M")+fc-l)(p+l)/(p-l)) < p+± 
p fe (l+(p^(")-l)(p + l)/(p-l)) ~ p-1' 

for all nonnegative integers k and v p (u). We thus obtain from (27) 

H(-v 2 D) < - rf P+ 1 ^-nP+ 1 



TT l —— < TT i— — 
11 p - l - J- 1 p - l 



vH(—D) 

p\v p<x 
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proving the second inequality in the lemma. To prove the third inequality, we first 
note that for v > Y\ p<x p the inequality holds for each prime x < 41, by a machine 
calculation, so we assume x > 41. We then have 



p-i ^ V p-i/ ^p-i ^ v 

p<x p<x p<x P<x PS X 

We now apply the bound J2 P < X p < Hog a; + £?i + l/(loga:) 2 from [55, 3.20], where 
Bi = 0.261497 . . ., and also £ p = 0.773156 . . . from [19], to obtain 

logTT < 2 Hog x + 2.218, 

1 p - 1 

valid for a; > 41. This yields IIp-c^ f~~r < 9-189 • log 2 x. We also have the bound 
x(l — 1/logx) < ^2 p<x ^ogp 7 valid for x > 41, by [55, 3.16], which implies 

TT ill < 9.189 • log 2 (1.369 • logv), 
V" P ~ 1 

For x > 41 we have logw > 30, and the RHS is then smaller than 11 llog 2 (i; + 4). □ 



Appendix 2 

Here we list some of the torsion constraints used to accelerate the search for an 
elliptic curve E/¥ p with p+l±t points, as described in Section 3. Each constraint 
has the form m — a ■ b ■ N , where a is a power of 2 and b is a power of 3. Curves 
with a point of order N are generated using a plane model for X\ (N) as in [65] , 
then filtered to ensure that the constraints implied by a and b are also met. When 
a or b is expressed in exponential notation, it is meant to control the exact power 
of 2 or 3 that divides #E. The torsion constraint 14 = 2° • 3° • 14, for example, 
indicates that #E is divisible by 14 but not divisible by 3 or 4. 

Efficient methods for analyzing the Sylow 2-subgroup of E(¥ p ) are considered 
in [53, 65], and for 3-torsion we use the 3-division polynomial [71, § 3.2]. For the 
sake of brevity, here we consider constraints on the Sylow 2-subgroup only up to 
4-torsion, but one may obtain minor improvements using 2 fc -torsion for larger k. 

The benefit of each constraint is computed as 1/r, where r is the proportion of 
elliptic curves E/¥ p that satisfy the constraint. We derive r using [38, Thm. 1.1], 
under the simplifying assumption that if N divides #E, then E(¥ p ) contains a 
point of order TV (necessarily true when the square part of N is coprime to p — 1). 
A more precise estimate may be obtained from [32, Thm. 3.15]. Table 5 assumes 
that p = 1 mod 3 and p ^ 1 mod I for primes I > 3 dividing N. It is easily adjusted 
to other cases via [38, Thm. 1.1]; this will change the rankings only slightly. 

The cost of each constraint was determined empirically (and is somewhat imple- 
mentation dependent). For a random set of primes p of suitable size (30-50 bits) we 
measured the average time to: (1) generate a curve E/¥ p satisfying the constraint, 
(2) obtain a random point P e E(¥ p ), and (3) compute the points (p+l)P and tP. 
This is compared to the cost of (2) and (3) alone (the "null case" for Algorithm 1.1, 
excluding TestCurveOrder which is rarely called). The parametrizations of [3] 
combine (1) and (2), enabling a cost of less than 1.0 in some cases. 
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torsion 


IJCI1CI11 


cost 


ratio 


vet 


torsion 


OCI1C11L 


cost 


ratio 


33 


2°-3- 


11 


80.0 


2.3 


34.3 


44 


4- 


11 


24.0 


1.8 


13.0 


39 


2°-3- 


13 


96.0 


3.0 


31.5 


93 


2°-3- 


31 


240.0 


18.9 


12.7 


51 


2°-3- 


17 


128.0 


4.4 


29.0 


34 


2 1 - 


17 


64.0 


5.0 


12.7 


15 


2°- 


15 


32.0 


1.2 


26.4 


28 


2 • 


14 


14.4 


1.2 


12.4 


11 


2°- 


11 


30.0 


1.2 


25.9 


52 


4- 


13 


28.8 


2.4 


12.2 


57 


2°-3- 


19 


144.0 


5.7 


25.4 


18 


2°- 


18 


26.2 


2.2 


12.0 


66 


2 1 -3- 


11 


106.7 


4.3 


24.7 


36 


2 • 


18 


15.7 


1.3 


12.0 


21 


2° -3 


• 7 


48.0 


2.1 


23.1 


68 


4- 


17 


38.4 


3.4 


11.2 


69 


2°-3- 


23 


176.0 


7.8 


22.4 


38 


2 1 - 


19 


72.0 


6.7 


10.8 


78 


2 1 -3- 


13 


128.0 


5.8 


22.0 


10 


2°- 


10 


16.0 


1.5 


10.7 


13 


2°- 


13 


36.0 


1.6 


21.8 


174 


2 1 • 3 • 


29 


298.7 


28.6 


10.4 


9 


2° 


• 9 


19.6 


1.0 


20.2 


20 


2 • 


10 


9.6 


0.9 


10.4 


102 


2 1 -3- 


17 


170.7 


8.5 


20.0 


348 


4-3- 


29 


179.2 


18.0 


9.9 


42 


2° -3- 


14 


64.0 


3.2 


20.0 


76 


4- 


19 


43.2 


4.4 


9.9 


7 


2° 


• 7 


18.0 


0.9 


19.6 


46 


2 1 - 


23 


88.0 


9.2 


9.5 


132 


4-3- 


11 


64.0 


3.3 


19.2 


29 


2°- 


29 


84.0 


8.9 


9.5 


17 


2°- 


17 


48.0 


2.5 


19.0 


48 


3 • 


16 


21.3 


2.3 


9.4 


156 


4-3- 


13 


76.8 


4.2 


18.2 


3 


2° 


• 3 


8.0 


0.9 


9.2 


204 


4-3- 


17 


102.4 


5.9 


17.5 


92 


4- 


23 


52.8 


6.0 


8.8 


114 


2 1 -3- 


19 


192.0 


11.0 


17.4 


12 




12 


6.4 


0.7 


8.8 


30 


2 1 - 


15 


42.7 


2.5 


16.9 


186 


2 1 • 3 • 


31 


320.0 


36.9 


8.7 


84 


2-3- 


14 


38.4 


2.3 


16.5 


31 


2°- 


31 


90.0 


11.5 


7.8 


19 


2°- 


19 


54.0 


3.3 


16.4 


6 


2° 


• 6 


10.7 


1.4 


7.4 


22 


2 1 - 


11 


40.0 


2.5 


16.2 


16 




16 


8.0 


1.1 


7.2 


228 


4-3- 


19 


115.2 


7.4 


15.7 


58 


2 1 - 


29 


112.0 


17.4 


6.4 


87 


2°-3- 


29 


224.0 


14.5 


15.5 


116 


4- 


29 


67.2 


11.0 


6.1 


138 


2 1 -3- 


23 


234.7 


15.3 


15.4 


8 




8 


4.0 


0.7 


5.9 


26 


2 1 - 


13 


48.0 


3.3 


14.4 


62 


2 1 - 


31 


120.0 


23.0 


5.2 


23 


2°- 


23 


66.0 


4.7 


14.2 


124 


4- 


31 


72.0 


14.2 


5.1 


276 


4-3- 


23 


140.8 


10.0 


14.0 


2 


2° 


• 2 


4.0 


0.9 


4.3 


14 


2°- 


14 


24.0 


1.7 


13.8 


4 




4 


2.4 


0.6 


3.8 


60 


4- 


15 


25.6 


1.9 


13.5 


1 


2° 


• 1 


3.0 


0.8 


3.7 


5 


2° 


• 5 


12.0 


0.9 


13.0 















Table 5. Ranking of m-torsion constraints (for p = 1 mod 3). 
Dominated constraints are not listed, e.g. 3 • 4 • 31 is always inferior to 12. 



The rankings in Table 5 assume each constraint is applicable to both iV = p+l—t 
and Ni = p+ 1 +t; if not, the effective ratio is about half the listed value (9/16, on 
average). For given values of p and t, we thus consider three possible constraints, 
one satisfied by 7V , one by Ni, and one by both, and pick the best of the three. 
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